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© Learning method for a learning apparatus. 

© A learning method for adjusting control parameters 8 of a learning apparatus in which input data is 
converted into output data by utilizing the control parameters 9 and teaching signal data consists of steps of 
defining undesirable output data as a type of the teaching signal data, defining a loss function r = exp(-x 2 ) of 
which a value decreases along with the increase of a difference x between the undesirable output data and the 
output data, providing both the input data and the undesirable output data to the learning apparatus, calculating a 
value of the output data by converting a value of the input data by utilizing values of the control parameters 0. 
calculating a value of the loss function r by utilizing both the value of the output data and a value of the 
undesirable output data, iteratively calculating the value of the loss function r by renewing the values of the 
control parameters 6 to decrease the value of the loss function r, and adjusting the control parameters 9 of the 
learning apparatus to newest values of the control parameters 9 when the value of the loss function is less than 
the prescribed value. 
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BACKGROUND OF THE INVENTION 

1 . Field of the Invention 

5 The present invention relates to a leaning method for implementing the learning for learning apparatus 

in which an artificial neural network is utilized to determine the most appropriate control parameters by 
calculating a value of a loss function, and a non-linear data conversion is implemented. 

2. Description of Background 

70 

Recently, learning apparatus utilizing an artificial neural network has attracted considerable attention in 
performing non-linear data conversion. The artificial neural network has been proposed as an artificial model 
of the neural circuits in brain cells, so that the artificial neural network has a learning function and non-linear 
performance. The learning apparatus has therefore been utilized to recognize many types of patterns, to 
75 predict stock prices, to diagnose illness, to control the action robots, and the like. 

To utilize the above learning apparatus, control parameters of the learning apparatus, which exert an 
effect on the movement of the learning apparatus, must be adjusted in advance according to the purpose to 
provide proper movement. In other words, the learning apparatus must learn an operation in advance before 
practical operations are performed. 
20 A supervisory learning method has been known as one learning method utilizing an artificial neural 
network to adjust the control parameters of the learning apparatus learning. In supervisory learning methods 
such as back-propagation learning, training example data composed of both input data and teaching signal 
data is provided to the learning apparatus. The teaching signal data is utilized to attract the output data to 
the desired data. That is, control parameters such as weight parameters in the artificial neural network are 
25 iteratively adjusted so as to decrease the value of a prescribed loss function. 

For example, the desired output data is provided for the conventional learning apparatus as a type of 
teaching signal data. In this case, the output data should be close to the desired output data. Therefore, the 
learning apparatus learns to decrease the difference between the output data and the desired output data. 
The above supervisory learning method can be also applied to a learning apparatus in which non-linear 
30 data is converted without utilizing the artificial neural network. 

On the other hand, in cases where the movement of robot's arms are specified in a robot control 
program utilizing the learning apparatus, the movement of the arms must be limited to avoid many 
obstacles. Therefore, many forbidden conditions are provided to the robot. Moreover, in cases where a 
doctor identifies the disease of a patient in a diagnosis in medical treatment after observing X-ray 
35 photographs or electrocardiograms, the doctor utilizes a deletion method to delete a doubtful disease. 

Accordingly, in industrial fields utilizing the learning apparatus, there are a large number of cases in 
which undesirable output data which is not acceptable as output data is provided to the learning apparatus. 
This undesirable output data is also another type of teaching signal data. 

However, it is not possible for a conventional learning apparatus learn to learn not to output data which 
40 is very different from the specified undesirable output data in cases where such undesirable output data is 
provided to the learning apparatus. Therefore, an operator on receiving undesirable output data must create 
the desired output data from the undesirable output data by relying on his experience or on incomplete 
knowledge. Thereafter, the desired output data determined by the operator is provided to the learning 
apparatus, and the control parameters of the learning apparatus are adjusted to decrease the difference 
45 between the output data and the desired output data. 

Accordingly, it is very troublesome to have the learning apparatus learn by utilizing the desired output 
data in cases where the operator receives the undesirable output data. 

Moreover, the output data must converge on the desired output data regardless of the fact that the 
desired output data is incomplete in cases where incomplete desired output data is provided to the learning 
so apparatus. 

Therefore, it is sometimes impossible to cause the output data to converge on the desired output data. 
Moreover, when many items of desired output data are provided to the learning apparatus, the control 
parameters of the learning apparatus are sometimes adjusted so as to converge on limited items of desired 
output data. 

55 In cases where undesirable output data is provided to the learning apparatus, the output data is allowed 
to converge over a large range so that convergene of the output data is easily accomplished. Moreover, 
well-balanced output data is obtained by providing many items of undesirable output data for the learning 
apparatus because the output data is not attracted to limited items of undesirable output data. That is, the 
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output data is positioned far from all items of undesirable output data. 

On the other hand, in cases where the output data is quantitative, it is sometimes necessary to have the 
learning apparatus learn to converge on one of several regions which are divided by a boundary. For 
example, in cases where the output data is expected to be more than a prescribed temperature or a 
5 prescribed pressure, the teaching signal data designates the prescribed temperature or the prescribed 
pressure. Moreover, the arms of a robot are prohibited from passing through the opposite side of a wall in 
the robot control program. In this case, the teaching signal data designates the wall. In short, the learning 
represented by a sign of inequality such as "the output data > the teaching signal data" is, for example, 
necessary. 

to However, the operator must create the desired output data to designate the boundary in the conven- 
tional supervisory learning method. It is therefore very troublesome to have the learning apparatus learn by 
utilizing the desired output data in cases where it is desired that the output data converges on one side of a 
region. 

15 SUMMARY OF THE INVENTION 

A first object of the present invention is to provide, with due consideration to the drawbacks of such 
conventional learning method, a learning method in which control parameters of a learning apparatus are 
easily adjusted to provide well-balanced output data without converging on the limited number of the items 
20 of teaching signal data. 

A second object of the present invention is to provide a learning method in which the control 
parameters of learning apparatus are easily adjusted so that output data converges on one side of regions 
which are divided by a boundary designated by teaching signal data; 

The first object is achieved by the provision of a learning method for adjusting control parameters of a 
25 learning apparatus in which input data is converted into output data by utilizing the control parameters and 
teaching signal data, comprising steps of: 

defining undesirable output data which is not acceptable as output data, the undesirable output data 
being a type of the teaching signal data; 

defining a loss function r of which a value decreases along with the increase of a difference x between 
- 30 the undesirable output data and the output data, the loss function r being designated by an equation r = 
exp(-x 2 ); 

providing'both the input data and the undesirable output data to the learning apparatus; 
calculating a value of the output data by converting a value of the input data by utilizing values of the 
control parameters; 

35 calculating a value of the loss function r by utilizing both the value of the output data and a value of the 
undesirable output data; 

iteratively calculating the value of the loss function r by renewing the values of the control parameters 
to decrease the value of the loss function until the value of the loss function is less than a prescribed value 
in cases where the value of the loss function is equal to or greater than the prescribed value; and 
40 adjusting the control parameters of the learning apparatus to newest values of the control parameters 
when the value of the loss function is less than the prescribed value. 

In the above steps, the loss function is gradually decreased by iteratively calculating the value thereof 
while the values of the control parameters are changed. Therefore, the difference between the undesirable 
output data and the output data is increased. In other words, the output data is compulsorily shifted far from 
45 the undesirable output data. 

Accordingly, the control parameters of the learning apparatus can be adjusted to shift the output data 
far from the undesirable output data. In other words, the undesirable output data is not provided as the 
output data in the learning apparatus. 

The first object is also achieved by the provision of a learning method for adjusting both weight 
so parameters and threshold values of a learning apparatus by utilizing teaching signal data and a plurality of 
neurons interconnected in an artificial neural network in which input data provided to first neurons of a first 
stage is weighted by the weight parameters and the threshold values is subtracted from the weighted input 
data so that the weighted input data is transmitted to final neurons of a final stage through hidden stages in 
which the data is weighted with the weight parameters, subtracted the threshold values, and converted by 
55 applying a prescribed monotone increasing function, after which output data is provided from the final 
neurons, comprising steps of: 

defining undesirable output data which is not acceptable as output data, the undesirable output data 
being a type of the teaching signal data; 
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defining a loss function which decreases along with the increase of the difference between the 
undesirable output data and the output data; 

providing the input data to the first neurons; 

providing the undesirable output data to the learning apparatus; 
5 calculating a value of the output data obtained by converting the input data while utilizing the weight 

parameters, the threshold values, and the monotone increasing function; 

calculating a value of the loss function by utilizing both the value of the output data and a value of the 
undesirable output data; 

iteratively calculating the value of the loss function by renewing the weight parameters and the 
10 threshold values to decrease the value of the loss function until the value of the loss function is less than a 
prescribed value in cases where the value of the loss function is equal to or greater than the prescribed 
value; and 

adjusting the weight parameters and the threshold values of the learning apparatus to the newest weight 
parameters and the newest threshold values when the value of the loss function is less than the prescribed 
75 value. 

In the above steps, because the learning method utilizes the artificial neural network, the input data 
provided for the first neurons is automatically transmitted to the final neurons according to predetermined 
calculations so that the input data is automatically converted to the output data. The calculations in the 
artificial neural network are also applied to a non-linear data conversion. For example, because the input 
20 data is converted in the hidden and final neurons according to the prescribed monotone increasing function, 
the input data is non-linearly converted to the output data. 

In detail, the loss function is gradually decreased by iteratively calculating the value thereof while the 
weight parameters and the threshold values are changed. Therefore, the difference between the undesirable 
output data and the output data is increased. In other words, the output data is compulsorily shifted far from 
25 the undesirable output data. 

Accordingly, the weight parameters and the threshold values of the learning apparatus can be adjusted 
to shift the output data far from the undesirable output data. In other words, the undesirable output data is 
not provided as the output data in the learning apparatus. 

The first object is further achieved by the provision of a learning method for adjusting control 
30 parameters of learning apparatus in which input data is converted into output data by utilizing the control 
parameters and teaching signal data, comprising steps of: 

defining undesirable output data which is not acceptable as output data and is a type of the teaching 
signal data; 

defining desired output data which is desired as output data and is another type of teaching signal data, 
35 the desired output data occupying a region s; 

defining a distribution function s*g(x+) of the desired output data by utilizing both the region s occupied 
by the desired output data and a normal distribution g(x) which is defined by both a variance o 2 and a 
difference x+ between the output data and the desired output data, the distribution function s*g(x+) being 
defined by an equation 

40 

s*g(x+) = s*[1/(27r) 1/2 a] * exp[-x + 2 /(2a 2 )]; 

defining a distribution function 1 - s*g(x-) of the undesirable output data by utilizing both the region s 
occupied by the desired output data and a normal distribution g(x-) which is defined by both a variance a 2 
45 and a difference x- between the output data and the undesirable output data, the distribution function 1 - 
s*g(x-) being defined by an equation 

1 -s*g(x-) = 1 -s*[1/(27r) 1/2 a]*exp[-x- 2 /(2a 2 )]; 

50 providing both the input data and the teaching signal data to the learning apparatus; 

calculating a value of the output data by converting a value of the input data by utilizing values of the 
control parameters; 

deriving an attraction term r+ = In (a) + x+ 2 /(2a 2 ) and a repulsion term r- = s/[(2tt) 1/2 o] * exp[-x- 2 /- 
(2a 2 )] from a logarithmic-likelihood l E (x) based on a statistical entropy as follows, 

55 

-l E (x) = -In g(x+) + s-g(x-) 
r + = -In g(x+) 
r- = s*g(x-); 
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calculating a value of the attraction term r+ in cases where the desired output data is provided to the 

:zz7zzi s ^ si9nal da,a ' a va,ue of the attraction term r+ being d — d ato <* ^ ■» 

calculating a value of the repulsion term r- in cases where the undesirable output data is provided the 

.5S3!?S^? si9nal data> a va,ue of the repulsion term r - bein9 d — d — - •» 

calculating a loss function r found by adding the repulsion term r- and the attraction term r* together 

oaramlts Lnt 9 "" I 3 '" 6 ° f *** ' ^ the va,ues of the control 

parameters and the vanance o» to decrease the value of the loss function r until the value of the loss 
function r „ less than a prescribed value in cases where the value of the loss function is equal to or greater 
than the prescribed value: and y ^a«?< 

h n.h ad h US,in9 ,T COntr0 ' parame,ers and the variance ° 2 of the learning apparatus to newest values of 
both the con rol parameters and the variance <fi when the value of the loss function r is less than the 
15 prescribed value 

In the above steps the difference x between the output data and the teaching signal data is assumed to 
d.stnbute accord.ng to the d.stribution function s.g(x) by utilizing both the density distribution function g(x) 

d,3ri D u« 0 n e9 Th n S , ° C r P ' ed , 5 V he d6Sired ° UtpUt d3ta - The funCtion 9< x > is to a norma 

h k . h 9<X> h3S bee " determined based on a large number of calculation results 

performed by the inventor of the present invention. lunresuns 

^ti^.!!, 6 . at1ra °i i0n ^ " the repU ' Si0n tBrm r " are derived from both the function g(x) and a 
statistical entropy. Accord.ng to statistical theory, when an approximate function q is the most appropriate to 

m^mum value ^ ^ di stribution p. the statistical entropy defined by an equation p.ln(q/p) is the 

ol J^flff: 1 IlVT Sen l "T^"' be ° aUSe the ,OSS function is set 10 be the minimum value when the 
output data ,s shifted close to the desired output data, the attraction term r + related to the desired output 

^ eSp .° " d s to «n equation -p.| n <q/p). Moreover, the loss function -ln(q) of the desired output data is 
adopted to simplify the loss function by deleting the true probability distribution p " 

tJ^-STT T 6 T n - the distribution function 8-0 W « Adopted as the approximated function q. 
Therefore, the attraction term r* of the desired output data is 

r* = j-ln s-g(x+). 

as such T asTs e ' attraCti0n t6rm r+ iS Ca,Culated as r * = ln W + x+2/ < 2 ° 2 ) by eliminating constant values 
In the case of the undesirable output data, the approximate function 1-q is concluded- because the 
ouyul date is shifted far from the undesirable output data. Therefore, the repulsion te^r rented To *e 

r,?rT^ r ; = -'TV 1 *? 1 The repU ' Si0n term r " is approximated to a function r- - £ 
(x-) so that r- = s/[(2*) ia o] . exp[-x- 2 /(2o 2 )] is obtained. y 

The attraction and repulsion terms r + and r- are gradually decreased by iteratively calculating the value 
thereof while the values of the control parameters are renewed. Therefore, the output data is shifted c!ose to 
the desired output data, while the output data is shifted far from the undesirable output data 

to o^?n°tt n ?! y n 'i 0th T tr °' parameters and the v a"ance c* of the learning apparatus can be adjusted 
to obtain the most appropriate approximated function. 

The first object is further achieved by the provision of a learning method for adjusting both weiqht 
parameters and threshold values of learning apparatus by utilizing teaching signal data and a p ura% o 
neurons mterconnected in an artificial neural network in which input data provided to first neurons of ats 
stage is weighted with the weight parameters and the threshold values are subtracted from the weighted 
input data so that the weighted input data is transmitted to final neurons of a final stage throug^on 

T' Ch I"" ^ iS W6i9hted With the wei9ht Parameters, subtracted the threshold values 2S 
converted by applying a prescribed monotone increasing function, after which output data is provided from 
the final neurons, comprising steps of: 

signaMalaf UndeSir3b,e 0UtpUt data which ls not ^eptable as output data and Is a type of the teaching 

dat a d ^ ni Z d T ed , ^ WhlCh 18 dSSired 35 ° UtpUt data and 15 another of the teaching signal 

data, the desired output data occupying a region s; 

hv thf !T 9 3 diStr ; bu /^ n function s *9< x *> of the desired output data by utilizing both the region s occupied 
by the desired output data and a normal distribution g(x) which is defined by both a variance o* and a 
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difference x + between the output data and the desired output data, the distribution function s*g(x + ) being 
defined by an equation 

s-g(x+) = s41/(27r) 1/2 a]*exp[-x + 2 /(2a 2 )]; 

defining a distribution function 1 - s*g(x-) of the undesirable output data by utilizing both the region s 
occupied by the desired output data and a normal distribution g(x-) which is defined by both a variance a 2 
and a difference x- between the output data and the undesirable output data, the distribution function 1 - 
s*g(x-) being defined by an equation 

1 - s-g(x-) = 1 - s-[1/(2*) 1/2 o] . exp[-x- 2 /(2o 2 )]; 



providing the input data to the first neurons; 
providing the teaching signal data for the learning apparatus; 
15 calculating a value of the output data by weighting the input data with the weight parameters and 
subtracting the threshold values from the weighted input data; 

calculating a value of the output data by converting a value of the input data by utilizing values of the 
control parameters; 

deriving an attraction term r- = In (o) + x+^o 2 ) and a repulsion term r- = s/[(27r) 1/2 a] * exp[-x- 2 /(2a 2 )] 
20 from a logarithmic-likelihood l E (x) based on a statistical entropy as follows, 

-l E (x) = -In g(x*) + s*g(x-) 
r+ = -In g(x+) 
r- = s*g(x-) 

25 

calculating a value of the attraction term r + in cases where the desired output data is provided to the 
learning apparatus as teaching signal data, a value of the attraction term r + being decreased along with the 
decrease of the difference x+; 

calculating a value of the repulsion term r- in cases where the undesirable output data is provided to 
30 the learning apparatus as teaching signal data, a value of the repulsion term r- being decreased along with 
the increase of the difference x-; 

calculating a loss function r found by adding the repulsion term r- and the attraction term r + together; 
iteratively calculating the value of the loss function r by renewing the values of the weight parameters, 
the threshold values, and the variance a 2 to decrease the value of the loss function r until the value of the 
35 loss function r is less than a prescribed value in cases where the value of the loss function is equal to or 
greater than the prescribed value; and 

adjusting the weight parameters, the threshold values, and the variance a 2 of the learning apparatus to 
newest values of the weight parameters, the threshold values, and the variance a 2 when the value of the 
loss function r is less than the prescribed value. 
40 In the above steps, the output data is considered to be distributed according to the distribution function 
s*g(x) by utilizing both the region s and the density distribution function g(x) equivalent to a normal 
distribution. 

Both the attraction term r + and the repulsion term r- are derived from both the function g(x) and a 
statistical entropy p*ln(q/p). The function p is a true probability distribution p and the function q is an 

45 approximated function q. The entropy is the maximum value when the approximated function q most 
appropriately approximates the true probability distribution p. 

In the present invention, the distribution function s*g(x) is adopted as the approximated function q. 
Therefore, because the loss function r is set to be at a minimum when the output data approaches close to 
the desired output data, the attraction term r + related to the desired output data is r + = -In s*g(x + ). That is, 

so r«- = In (a) + x+ 2 /(2a 2 ) is found by eliminating constant values such as -In s. 

On the other hand, in the case of the undesirable output data, the approximate function 1-q is 
applicable because the output data is shifted far from the undesirable output data. Therefore, the repulsion 
term r- related to the undesirable output data is r- = -ln[l - s*g(x-)]. The loss function r- approximates a 
function r- = s*g(x-) so that r- = s/[(27r) 1/2 a] * exp[-x- 2 /(2o 2 )] is found. 

55 In the present invention, because the learning method utilizes the artificial neural network, the input data 
provided to the first neurons is automatically transmitted to the final neurons according to a predetermined 
calculation so that the input data is automatically converted to the output data. The calculation in the 
artificial neural network is also applied to a non-linear data conversion. For example, because the input data 
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is converted by the prescribed monotone increasing function in the hidden and final neurons, the input data 
is non-linearly converted to output data. 

In detail, the attraction and repulsion terms r+ and r- are gradually decreased by iteratively calculating 
the value thereof while the values of both the weight parameters and the variance o 2 are changed. 
5 Therefore, the output data approaches the desired output data, while the output data is shifted far from the 
undesirable output data. 

Accordingly, both the control parameters and the variance o 2 of the learning apparatus can be adjusted 
to obtain the most appropriate approximated function. 

The second object is achieved by the provision of a learning method for adjusting control parameters of 
io a learning apparatus in which many items of input data are converted into N items of output data by utilizing 
the control parameters and teaching signal data, comprising steps of: 

defining an output vector designated by N items of output data in N dimensions, the output vector 
indicating output coordinates of which components are equal to values of the output data; 

defining boundary specifying data which specifies a boundary surface dividing a desired region from an 
is undesired region in N-dimensional space, the boundary specifying data being a type of the teaching signal 
data; 

defining a loss function which decreases when the output coordinates indicated by the output vector are 
shifted toward the desired region from the undesired region specified by the boundary specifying data; 

providing both the input data and the boundary specifying data for the learning apparatus; 
20 calculating the output vector designated by the output data obtained by respectively converting each 
value of the input data by utilizing values of the control parameters; 

calculating a value of the loss function by utilizing both the output coordinates indicated by the 
calculated output vector and the boundary surface specified by the boundary specifying data; 

iteratively calculating the value of the loss function by renewing the values of the control parameters to 
25 decrease the value of the loss function until the value of the loss function is less than a prescribed value in 
cases where the value of the loss function is equal to or greater than the prescribed value; and 

adjusting the control parameters of the learning apparatus to newest values of the control parameters 
when the value of the loss function is less than the prescribed value. 

In the above steps, the boundary specifying data specifies not only the boundary surface but also the 
30 direction toward the desired region by specifying the desired and undesired regions. Moreover, the output 
coordinates are expected to be positioned within the desired region and are expected not to be positioned 
within the undesired region. 

Therefore, because the value of the loss function is gradually decreased by iteratively calculating the 
value of the loss function, the output coordinates are shifted toward the desired region from the undesired 
35 region while the control parameters of the learning apparatus are changed. 

Accordingly, the output coordinates specified by N items of the output data can be finally positioned in 
the desired region. 

The second object is also achieved by the provision of a learning method for adjusting both weight 
parameters and threshold values of a learning apparatus by utilizing teaching signal data and a plurality of 
40 neurons interconnected in an artificial neural network in which input data provided to first neurons of a first 
stage is weighted with the weight parameters and the threshold values is subtracted from the weighted 
input data so that the weighted input data is transmitted to final neurons of a final stage through hidden 
stages in which the data is weighted with the weight parameters, subtracted the threshold values, and 
converted by applying a prescribed monotone increasing function, after which output data is provided from 
45 the final neurons, comprising steps of: 

defining an output vector designated by N items of output data in N dimensions, the output vector 
designating output coordinates of which components are equal to values of the output data; 

defining boundary specifying data which specifies a boundary surface dividing a desired region from an 
undesired region in N-dimensional space, the boundary specifying data being a type of the teaching signal 
so data; 

defining a loss function which decreases when the output coordinates designated by the output vector 
are shifted toward the desired region from the undesired region specified by the boundary specifying data; 

providing the input data to the first neurons; 

providing the boundary specifying data to the learning apparatus; 
55 calculating the output vector designated by the output data obtained by respectively weighting the input 
data with the weight parameters and subtracting the threshold values from the weighted input data; 

calculating a value of the loss function by utilizing both the output coordinates designated by the 
calculated output vector and the boundary surface specified by the boundary specifying data; 
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iteratively calculating the value of the loss function by renewing the weight parameters and the 
threshold values to decrease the value of the loss function until the value of the loss function is less than a 
prescribed value in cases where the value of the loss function is equal to or greater than the prescribed 
value; and 

5 adjusting the weight parameters and the threshold values of the learning apparatus to the newest weight 

parameters and the newest threshold values when the value of the loss function is less than the prescribed 
value. 

In the present invention, because the learning method utilizes the artificial neural network, the input data 
provided for the first neurons is automatically transmitted to the final neurons according to a predetermined 

io calculation so that the input data is automatically converted to output data. The calculation in the artificial 
neural network is also applied to a non-linear data conversion. For example, because the input data is 
converted by the prescribed monotone increasing function in the hidden and final neurons, the input data is 
non-linearly converted to output data. 

In the above steps, the boundary specifying data specifies not only the boundary surface but also the 

75 direction toward the desired region by specifying the desired and- undesired regions. Moreover, the output 
coordinates are expected to be within the desired region and are expected not to be within the undesired 
region. 

Therefore, because the value of the loss function is gradually decreased by iteratively calculating the 
value of the loss function, the output coordinates are moved to the desired region from the undesired region 
20 while the weight parameters and the threshold values of the learning apparatus are changed. 

Accordingly, the output coordinates specified by N items of output data can be finally positioned in the 
desired region. 

BRIEF DESCRIPTION OF THE DRAWINGS 

25 

Fig. 1 is a block diagram of a learning apparatus according to a first modification of the present 
invention, the learning apparatus being conceptually shown. 

Fig. 2 is a block diagram of a learning apparatus for learning by utilizing a back-propagation learning 
method in a 3-layer feedforward artificial neural network according to the first embodiment of the first 
30 modification. 

Fig. 3 is a graphic view of a repulsion term r- of a loss function utilized in the first embodiment of the 
first modification, showing the relation between |O p -y p | and r-. 

Fig. 4 is a flowchart showing a learning method utilized in the learning apparatus shown in Fig. 2 
according to the first embodiment of the first modification. 
35 Fig. 5 is a graphic view showing the relation between the input data and the output data which is 
obtained in the learning apparatus according to the first embodiment of the first modification of the present 
invention. 

Fig. 6 is a graphic view showing a repulsion term r- p of a loss function according to a second 
embodiment of the first modification. 
40 Fig. 7 is a graphic view showing a repulsion term r- p of a loss function related to the undesirable output 
data according to a third embodiment of the first modification. 

Fig. 8 is a graphic view showing a repulsion term r- p of a loss function according to a fourth 
embodiment of the first modification. 

Fig. 9 is a block diagram of learning apparatus for learning by utilizing a back-propagation learning 
45 method in a 3-layer feedforward artificial neural network according^ to a fifth embodiment of the first 
modification. 

Fig. 10 is a graphic view showing the relation between the input data and the output data which is 
obtained by the learning in the learning apparatus according to the fifth embodiment of the first modification 
of the present invention. 

so Fig. 11 is a flowchart showing a learning method utilized in the learning apparatus shown in Fig. 2 
according to the sixth embodiment of the first modification. 

Fig. 12 is a graphic view of the generalized Gaussian distribution function Ga(x|6). 
Fig. 13 is a flowchart showing a learning method utilized in the learning apparatus shown in Fig. 2 
according to the seventh embodiment of the first modification. 
55 Fig. 14 is a block diagram of learning apparatus according to a second modification of the present 
invention, the apparatus being conceptually shown. 

Fig. 15 is a block diagram of learning apparatus for learning by utilizing the back-propagation learning 
method in a 3-layer feedforward artificial neural network according to a first embodiment of the second 
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modification. 

Fig. 16 shows the boundary surface specified by the boundary vectors 7. ^ in a three-dimensional 
space. 

Fig. 17 shows the relation among the vectors O, 7 and 7 in the 3 dimensional space. 
5 Fig. 18 is a graphic view of the relation between a boundary term r> of a loss function and the value of 

the inner product d. 

Fig. 19 is a flowchart showing a learning method performed in the learning apparatus shown in Fig. 15 
according to the first embodiment of the second modification. 

Fig. 20 is a graphic view showing a boundary term r> p relating to a loss function according to a second 
70 embodiment of the second modification. 

Fig. 21 is a graphic view showing a boundary term r> p a loss function according to a third embodiment 
of the second modification. 

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS 

75 

Preferred embodiments of learning apparatus and a learning method by which the learning apparatus 
learns according to the present invention are described with reference to drawings. 

Hg. 1 is a block diagram of a learning apparatus according to a first modification of the present 
invention, the learning apparatus being conceptually shown. 
20 As shown in Fig. 1, the learning apparatus for adjusting control parameters by which input data lj (i = 1 
to L) is converted to provide output data O k (k = 1 to N) corresponding to teaching signal data y k composed 
of both desired output data and undesirable output data, comprises: 

a plurality of input terminals 11 for respectively receiving the input data l it the terminals 11 being L in 
number; 

25 a conversion section 12 for converting the input data I, into intermediate data by utilizing control 

parameters Wj (j = 1 to M) according to a prescribed calculation; 

a plurality of output terminals 13 for respectively reconverting the intermediate data converted in the 

conversion section 12 according to a prescribed calculation to obtain the output data O k and for outputting 

the output data O k> the terminals 13 being N in number; 
30 a loss function calculating section 14 for calculating a value of a loss function r by utilizing both the 

teaching signal data which is provided by an operator and the output data O k provided from the output 

terminals 13; 

an attraction term calculating section 15 included in the section 14 for calculating an attraction term r+ 
of the loss function r by utilizing both the output data and the desired output data in cases where the 
35 desired output data is provided to the learning apparatus as teaching signal data, the value of the attraction 
term r+ being decreased when the output data O k is shifted to the desired output data y k ; 

a repulsion term calculating section 16 included in the section 14 for calculating a repulsion term r- of 
the loss function by utilizing both the output data and the undesirable output data in cases where the 
undesirable output data is provided to the learning apparatus as the teaching signal data, the value of the 
40 repulsion term r- being decreased when the output data O k is shifted far from the undesirable output data 
Yk\ and 

a control parameter adjusting section 17 for adjusting the control parameters Wj utilized in the 
conversion section 12 to decrease the sum of the values of the attraction and repulsion terms r + , r- 
calculated in the attraction and repulsion terms calculating sections 15, 16 by utilizing N items of teaching 
45 signal data (the desired output data or the undesirable output data) y k , the input data l jf the output data O ki 
and the control parameters Wj, the adjusted control parameters Wj being replaced with the control 
parameters Wj stored in the conversion section 12. 

Output data O k which is close to the desired output data and which is not close to the undesirable 
output data is desirable. 

so The teaching signal data is provided to the learning apparatus with the corresponding input data and a 
symbol indicating whether the teaching signal data is the desired output data or undesirable output data. 
Therefore, training example data provided by an operator is composed of the input data, the teaching signal 
data, and the symbol. 

The input data is sometimes provided to the learning apparatus after the learning has been completed. 
55 In the above configuration, the input data h is initially received in the input section 1 1 , and the teaching 
signal data is received in the loss function calculating section 14. The input data lj received in the input 
section 11 is converted in the conv rsion section 12 and is transmitted to the output section 13 to provide 
the output data O k for the loss function calculating section 14. 

9 



<SDOCID: <EP 049264 1A2_L> 



EP 0 492 641 A2 

In the loss function calculating section 14, the value of the loss function r= r+ + r- is calculated by 
utilizing both the teaching signal data and the output data. 

For example, in cases where the desired output data is provided to the loss function calculating section 
14 by the operator, the value of the attraction term r+ of the loss function is calculated in the loss function 
5 calculating section 15 in the same manner as in a conventional method. On the other hand, in cases where 
undesirable output data is provided to the loss function calculating section 14 by the operator, the value of 
the repulsion term r- of the loss function is calculated in the loss function calculating section 16. 

The value of the loss function calculated in the loss function calculating section 14 is then transmitted to 
the control parameter adjusting section 17. 
70 In the control parameter adjusting section 17, the control parameters Wj are adjusted according to a 
prescribed calculation method to decrease the value of the attraction or repulsion term r+, or r- of the loss 
function. The control parameters Wj adjusted in the control parameter adjusting section 17 are then 
replaced with the control parameters Wj which have already been stored in the conversion section 12. 

Thereafter, the replaced control parameters Wj are utilized to iteratively convert the input data l| and 
76 provide new output data O k for the loss function calculating section 14. 

Therefore, the control parameters Wj are iteratively adjusted in the control parameter adjusting section 
17 and are replaced with the adjusted control parameters Wj. As a result, the loss function r is decreased to 
a prescribed value. 

Accordingly, the control parameters Wj can be adjusted to the most appropriate values for shifting the 
20 output data Or close to the desired output data and for shifting the output data O k far from the undesirable 
output data. In other words, the learning apparatus according to the first modification of the present 
invention can learn by utilizing the undesirable output data. 

The details of the repulsion term r- of the loss function are described in the following embodiments. 

Next, a learning apparatus and a learning method by which the learning apparatus learns according to a 
25 first embodiment of the first modification are described with reference to Figs. 2 to 5. 

Fig. 2 is a block diagram of the learning apparatus for learning by utilizing a back-propagation learning 
method in a 3-Iayer feedforward artificial neural network according to the first embodiment of the first 
modification. 

As shown in Fig. 2, a learning apparatus forleaming by utilizing both teaching signal data y k and input 
30 data lj (i = 1 to L), which is provided by an operator in a 3-layer feedforward artificial neural network, consists 
of: 

an input layer 21 provided with a plurality of first neurons 22 for receiving the input data lj to the i-th 
neuron 22, the input data lj being weighted by a prescribed weight parameter; 

a hidden layer 23 provided with a plurality of hidden neurons 24 for receiving both the weighted input 
35 data from the first neurons 22 and a threshold value 0 H j (j = 1 to M') to the j-th hidden neuron 24 and for 
outputting hidden data 0 H j from the j-th hidden neuron 24, the hidden data O h j being weighted by a 
prescribed weight parameter; 

an output layer 25 provided with a plurality of final neurons 26 for receiving both the weighted hidden 
data from the hidden neurons 24 and a threshold value e° k (k = 1 to N) to the k-th final neuron 26, and for 
40 outputting data O k from the k-th final neuron 26; 

a first connection section 27 connected between the input layer 21 and the hidden layer 23 for 
multiplying the input data li by a weight parameter W^, the input data li being transmitted from the i-th first 
neuron 22 to the j-th hidden neuron 24; 

a second connection section 28 connected between the hidden layer 23 and the output layer 25 for 
45 multiplying the hidden data 0 H j by a weight parameter W° H kj , the hidden data 0 H j being transmitted from 
the the j-th hidden neuron 24 to the k-th final neuron 26; 

a loss function calculating section 29 for calculating a value of a loss function by utilizing both the 
output data O k provided from the k-th final neuron 26 and the teaching signal data y k corresponding to the 
output data O k , the teaching signal data y k being either desired output data or undesirable output data; 
50 an attraction term calculating section 30 included in the section 29 for calculating an attraction term r+ 
of the loss function by utilizing both the output data O k and the desired output data y k in cases where the 
desired output data y k is provided to the learning apparatus as the teaching signal data y k , the value of the 
attraction term r+ being decreased when the output data O k is shifted close to the desired output data y k ; 

a repulsion term calculating section 31 included in the section 29 for calculating a repulsion term r- of 
55 the loss function relating to the undesirable output data y k by utilizing both the output data O k and the 
undesirable output data y k in cases where the undesirable output data y k is provided to the learning 
apparatus as the teaching signal data y k , the value of the repulsion term r- being decreased when the 
output data O k is shifted far from the undesirable output data y k ; and 
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a cornel parameter adjusting section 32 for adjusting control parameters such as the weight parameters 
VVr'ji, W° H k, utilized in the first and second connection sections 27, 28 and the threshold values $" b e° k to 
decrease the value of the loss function by utilizing the teaching signal data y k (the desired output data or 
undesirable output data), the input data I,, the output data O k , the weight parameters W^, W° H ,g, and the 
threshold values <? H ,, e° k . The value of the loss function is obtained by adding both the attraction and 
repulsion terms r*. r- calculated in the attraction and repulsion terms calculating sections 30, 31. 

Relational equations for finding the output data O k by utilizing the input data I,, the weight parameters 
Wji' W° H kJt and the threshold values e H jt e° k . are as follows. 

O h , = f(l H j) (2) 



J " \ 3 J 

O k = f(l° k ) (4) 

f(x) = {1 + exp(-x)}" 1 (5) 

where, 1", is a* value of the data provided to the j-th hidden neuron 24, and l° k is a value of the data 
provided to the k-th final neuron 26. Moreover, the function f indicates the conversion in the neurons 24. 26. 

In the present invention, Q sets of training example data composed of the input data li, the teaching 
signal data y kt and the symbol Tp are provided to the learning apparatus. Whether the teaching signal data 
y k is the desired output data or undesirable output data is distinguished by utilizing a symbol Tp. That is, 
the p-th (p- 1 to Q) set of teaching signal data y k is the desired output data when Tp is indicated by a 
symbol + (Tp - + ), while the p-th set of teaching signal data y k is the undesirable output data when Tp is 
indicated by a symbol - (Tp = -). 

In this case, the p-th set of training example data \ it y k is indicated by If, y k P, and the output data O k 
calculated by applying the equations (1) to (5) is indicated by O k P. Moreover, the attraction and repulsion 
terms r + , r- of the loss function calculated in the loss function calculating section 29 are indicated by t + p, 
r- p , and the others are indicated by attaching the letter p. 

The attraction term r+P of the loss function and the repulsion term r-P of the loss function are 
respectively defined by the following equations. 

r.» = (l/2)«|(O k p -y k >) 2 -—(6) 



r- p 



k 

N 



7-**-»exp{l/2 - 2( 0k p -y k p)*/(2.0-2 )} -—(7) 



where, 7 - is a parameter for adjusting a contributory ratio of the undesirable output data to the desired 
output data, and £- is a parameter for adjusting the influence of the undesirable output data on the output 
data. 

The feature of the attraction and repulsion terms r+p,r-P is as follows. The attraction term r*P is 
decreased when the output data O k P is shifted close to the desired output data y k . That is, because output 
of the desired output as output data is desired, the decrease of the attraction term r + p is adequate. 

Fig. 3 is a graphic view of the repulsion term r-P utilized in the first embodiment of the first 
modification, showing the relation between {ZiOf-yffy 2 and r-P. {r<O k P-y k P) 2 } 1/2 is shown by a simple 
indication |O p -yP| for a Y-axis in Fig. 3. 

As shown in Fig. 3, the repulsion term r-P is decreased when the output data O k p is shifted far from the 
undesirable output data y k P. That is, because output of the undesired output as output data is not desired, 
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the decrease of the repulsion term r- p relating to the undesirable output data is adequate. 

Therefore, in cases where the desired output data is provided to the learning apparatus as prescribed 
sets of teaching signal data and the undesirable output data is provided to the learning apparatus as the 
remaining sets of teaching signal data, adjusment of the control parameters to decrease the sum of the 
5 attraction and repulsion terms E (r+ p + r- p ) is adequate because both the attraction and repulsion terms 
r+ p , r- p tend to be decreased. 

Accordingly, a loss function r is defined for adjusting the control parameters such as the weight 
parameters W^, W° H kj and the threshold values 6 H j, 6° k to the most appropriate values by utilizing the Q 
sets of the training example data as follows. 

10 

r = (1/Q)«E r Tp «> 
P 

* (1/Q)»( r* + r-) 

76 - (1/Q)»(Z r*> * 2 r-*) (8) 

p : D p : U 

W here r Tp p is a value of the loss function obtained when the p-th learning data is provided to the learning 
20 apparatus. Therefore, r Tp p is either r+ p or r- p according to the type of teaching signal data. E p:D indicates 
the sum of all attraction terms, and r p:U indicates the sum of all repulsion terms. 

That is, when the value of the loss function r is decreased to less than a prescribed value, the control 
parameters W^, W° H kj , 0% and e° k are judged to be adjusted to the most appropriate values. 

The contributory ratio Lr- P /Er+ P of the undesirable output data to the desired output data generally 
25 variesaccording to properties of the teaching signal data. Therefore, the contributory ratio Er- P /Er+ P is 
adjusted by utilizing the parameter -y-. That is, in cases where the parameter 7 - is increased, the repulsion 
term r- of the loss function is increased so that the influence of the undesirable output data is increased. 

Moreover, the influence of the undesirable output data on the output data can be specified by adjusting 
the parameter 0-. 

30 The repulsion term r- is adopted because the shape of the repulsion term r- is simple and the value of 
the repulsion term r- is promptly decreased when the output data O k p provided from the output layer 25 is 
shifted far from the undesirable output data. The feature of the repulsion term r- relating to the undesirable 
output data are described as follows. 

As shown in Fig. 3, a partial derivative of r- with respect to |O p -y p | gradually approaches the zero value 

35 as |O p -y p | becomes larger. Therefore, when the output data O k p is furthest from the corresponding 
undesirable output data y k p , the repulsion term r- is not shifted in practice regardless of whether the control 
parameters W^, W° H k j,e H j, and 0° k are adjusted to new values. In other words, the influence on the output 
data of items of undesirable output data furthest from the output data is small, and the control parameters 
W^ji, W° H kj , e H j, and 0° k are adjusted by utilizing the other items of undesirable output data y k p which are 

40 comparatively close to the corresponding output data O k p . Therefore, the learning can be stably imple- 
mented. 

Moreover, the partial derivative dr-/d|O p -y p | equals zero when |O p -y p [ equals zero. Therefore, when each 
item of output data O k p is close to the corresponding undesirable output data y k p the output data O k p can 
be easily shifted in any direction in an N-dimensional space. 
45 Accordingly, the learning method in which the undesirable output data y k p is utilized can be utilized for 
general purposes such as robot control. 

Moreover, because the repulsion term r- is exponentially decreased with respect to |O p -y p |, the 
influence of the undesirable output data on the output data O k p is limited to a short range. 

Next, a method for adjusting the control parameters W^, W° H k j,0 H j, and A0° k is described. 
50 Renewed values AW" 1 ,,, AW° H kj for slightly adjusting the weight parameters W" 1 ],, W° H kj for each 
iterative calculation are defined as follows. 

3r/3WHiji - (1/QWZ ar*>/3WHiji + z 8r-*/8W««ji) (9) 

p : D p :U 

55 

AW H, jj(t+ 1) = -(l-ar^r/aW^ + a-AV/Vt) (10) 
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3r/3W°« kj = (1/Q)*(£ 3r*>/3W°H kj + £ 3r- > /3W° « k j ) 

p:D p:U 

5 AW° H kj (t+1) = -(1-a)-^r/dW° H kj + a*AW° H kj (t) (12) 

where, t is an iteration number of the calculations implemented to adjust the control parameters so that both 
AwHI |iW and AW° H kj (t) indicate the renewed values at the iteration t. Moreover^ is a learning rate, and a is 
a momentum parameter. 1/Q is a standardization term for standardizing the partial derivatives. 

That is, each renewed value at the iteration (t + 1) is influenced by the renewed value at the iteration t 
while being adjusted by the momentum parameter a. The contribution of the partial derivatives dr/dW^j, 
dr/dW° H kj on the renewed values AW^t). AW 0 ^^) is determined by the learning parameter t,. 

Moreover, renewed values Ae H jf A0° k for slightly adjusting the threshold values e% e° k for each iterative 
calculation are defined as follows. 



3r/36H. s (1/Q). (| 3r*»/3*H. * | 3r->/30 H . ) „ _ (13) 

20 Ae H j(t + 1) = -(1-a)*^r/6e H j + a.A0 H j(t) (14) 

3r/390 k = (1/Q) #( £ sr.p/seo, * j 3r.p/3e° k ) — (1S) 

25 A0° k (t + 1) = -(1-ar)*v*r/50° k + a.A0° k (t) (16) 

Therefore, renewed weight parameters W^t* 1), W° H kj (t + 1) and the renewed threshold values e H r 
(t + 1 ), 0° k (t + 1 ) are found as follows. 

30 W^jift+I) = W%(1) + AW^t + t) (17) 

W° H kj (t + 1 ) = W° H kj (t) + AW° H kj (t + 1 ) (1 8) 

e H j(t + 1) = © H j(t) + A0 H j(t+1) (19) 

0° k (t + 1) = 0° k (t) + A*° k (t + 1) (20) 

As a result, the loss function r is decreased by utilizing the control parameters such as the weight 
parameters W^ft), W° H kJ (t) and the threshold values 0 H j(t), 0° k (t) which are iteratively adjusted by applying 
the equations (9) to (16). Finally, the loss function r is minimized by utilizing the control parameters adjusted 
to the most appropriate values: A method of minimization based on the equations (9) to (20) is well known 
as the steepest-descent method. 

A condition for stopping the iterative calculations for adjusting the control parameters to the most 
appropriate values is determined as follows. 

r<E L (21) 

where, E L is a finishing parameter for determining the finishing conditions. 

Next, a learning method by which the learning apparatus shown in Fig. 2 learns by adjusting the control 
parameters such as the weight parameters W^, W° H kj and the threshold values 0 H jf e° k is described. In the 
learning method, the attraction and repulsion terms r+, r- defined by the equations (6), (7) are utilized. 

Fig. 4 is a flowchart showing a learning method for the learning apparatus shown in Fig. 2 according to 
the first embodiment of the first modification. 

As shown in Fig. 4, a parameter p designating a set number of training example data {If, y k p , Tp} 
composed of the teaching signal data y k p , the input data If, and the symbol Tp is set to 1 (p = 1) in a step 
101 when the training example data {If, y k p Tp} is provided to the learning apparatus by an operator. 
Thereafter, L items of input data lj p are provided to the first neurons 22, and N items of teaching signal data 
y k p and the symbol Tp are automatically transmitted to the lose function calculating section 29 in a step 
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102. 

Thereafter, N items of output data O k p are calculated in the 3-layer feedforward artificial neural network 
by applying the equations (1) to (5) before the output data O k p is transmitted to the loss function calculating 
section 29 in a step 103. 

5 Thereafter, whether the teaching signal data y k p is the desired output data or undesirable output data is 

judged by inspecting the symbol Tp in a prescribed control section (not shown) in a step 104. That is, in 
cases where Tp indicates the symbol + , the teaching signal data y k p transmitted to the section 29 is judged 
to be the desired output data. On the other hand, in cases where Tp indicates the symbol the teaching 
signal data y k p transmitted to the section 29 is judged to be the undesirable output data. 

w In cases where the teaching signal data y k p is judged to be the desired output data in the step 1 04, a 
value of the attraction term r+ p is calculated in the attraction term calculating section 30 by utilizing the N 
items of desired output data according to the equation (6), and the calculated value is stored in a prescribed 
memory (not shown) of the section 30 in a step 105. Thereafter, the control parameters W^j, W° H k j, e H j, and 
0° k utilized in the first and second connection sections 27, 28 are transmitted to the control parameter 

75 adjusting section 32 so that partial derivatives dr+P/aW^j,, dr+ p /dW° H kj , dr + P/de H j, and dr + p /d0° k are 
calculated by utilizing the value r+ p stored in the section 30 in the control parameter adjusting section 32 
and are stored in a memory (not shown) of the section 32 in a step 106. 

In cases where the teaching signal data y k p is judged to be the undesirable output data in the step 104, 
a value of the repulsion term r- p is calculated in the repulsion term calculating section 31 by utilizing the N 

20 items of undesirable output data according to the equation (7), and the calculated value is stored in a 
prescribed memory (not shown) of the section 31 in a step 107. Thereafter, the control parameters W^j, 
W° H k j. 0 H j. and e °* utilized ,n the * irst and second connection sections 27, 28 are transmitted to the control 
parameter adjusting section 32 so that partial derivatives dr-P/dW 1 ^, dr-P/dW 0 ^, dr- p /d0 H j. and dr- p /d0° k 
are calculated by utilizing the values r- p stored in the section 31 in the control parameter adjusting section 

25 32 and are stored in the memory of the section 32 in a step 108. 

Thereafter, whether or not the parameter p equals the number Q of items of training example data is 
judged in the prescribed control section in a step 109. In cases where the parameter p is judged not to 
equal the number Q, the parameter p is incremented by 1 in a step 110 so that the procedures from the 
step 102 to the step 109 are repeated until the attraction and repulsion terms r+ p , r- p and the partial 

30 derivatives dr^/dV^j,, dr + p /dW° H kj , 6r + p /de H j, dr + p /de° kJ dr-P/dVv+V dr- p /W° H kj , dr- p /de H j, and dr- 
p /d0° k are stored throughout all the entire sets of training example data in the memory of the sections 29, 
32. 

In cases where the parameter p is judged to equal the number Q in the step 109, this judgement means 
that all sets of training example data have been calculated to provide Q sets of output data for the section 

35 29. Moreover, the judgement means that the attraction and repulsion terms r+ p , r- p and the partial 
derivatives ar^/dW^, ar + p /dW° H kj , 6r + p /d0 H j, dr + p /de° k , dr-P/dVv+V 6r- p /dW° H kj , dr- p /dd H it and 
dr- p /de° k are stored throughout all the sets of training example data in the memory of the sections 29, 32. 
Therefore, renewed values AV^'jjft), AW° H kj (t) A0 H j(t), and A0° k (t) are calculated by applying the equations 
(9) to (16) so that the weight parameters W^t), W° H kj (t) and the threshold values e H j(t), 0° k (t) are slightly 

40 renewed by applying the equations (17) to (20) in a step 111. That is, renewed weight parameters W^jr 
(t + 1), W° H k] (t+1) and renewed threshold values e H j(t + 1), 0° k (t+1) are determined. Moreover, the value of 
the loss function r is then found by utilizing the attraction and repulsion terms r+ p , r- p stored in the memory 
of the section 29 according to the equation (8) in a step 112. In cases where the calculations in the steps 
111, 112 are implemented for the first time, the iteration number t equals 1. 

45 Thereafter, whether or not the value of the loss function r satisfies the finishing condition is judged by 
applying the equation (21) in a step 1 13. That is, in cases where the value of the loss function r is equal to 
or greater than E Ll the finishing condition is not satisfied, thereforethe iteration number t is incremented by 
1. Thereafter, the procedures from the step 101 to the step 113 are iterated while utilizing the renewed 
control parameters W^t* 1), W° H kj (t+ 1), 0 H j(t + 1), and 0° k (t + 1) in place of the control parameters W^t), 

50 W° H kj (t), 0 H j(t). and 

The iteration of the procedures from the step 101 to the step 113 is continued until the finishing 
condition is satisfied. 

In cases where the value of the loss function r is less than E L> the finishing condition is satisfied in the 
step 113 so that the learning in the learning apparatus is finished. That is, the control parameters such as 
55 the weight parameters and the threshold values are adjusted to the most appropriate values. 

Accordingly, after the weight parameters and the threshold values are adjusted to the most appropriate 
values, the operator can accomplish the most appropriate diagnosis, the most appropriate robot control, or 
the like without utilizing the section 29. 
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Next, the results of the learning In the learning apparatus are shown. 

Fig. 5 is a graphic view showing the relation between the input data and the output data which is 
obtained by the learning in the learning apparatus according to the first embodiment of the first modification 
of the present invention. 
5 The values of the parameters utilized for implementing the learning are as follows. 

L = 1, M' = 5, N = 1, 7 - = 1, a = 0.5, 0=0.05, t, = 10, and E L = 10" 5 

Moreover, five items of desired output data are provided, and two items of undesirable output data are 
provided. 

The initial values of the weight parameters and the threshold values are as follows. 
io 0 H i(O) = O, 0 H 2 (O) = 2.5, 0 H 3 (O) = 5, 0 H *(O) = 7.5, * H 5 (0) = 10, 0°i(O) = O 

W OH 11 (0) = 0, W° H 12 (0) = 0 f W° H 13 (0) = 0, W° H M (0) = 0, W° H i5<0) = 0, W", ,(<)) = 10. W H, 21 (0) = 10, ^31(0)- 
= 1 0, W" 1 * , (0) = 1 0, W^s 1 (0) = 1 0 

As shown in Fig. 5, a large number of items of input data of which values are between 0 and 1 are 
provided one by one for the learning apparatus, and the output data is obtained for each item of input data. 
75 A curved dash line indicates the result of the learning implemented by utilizing five items of desired 
output data without utilizing any undesired output. A curved solid line indicates the result of the learning 
implemented by utilizing both five items of desired output data and two items of undesirable output data. 

The curved dash line passes through five items of desired output data without avoiding the two items of 
undesirable output data. On the other hand, the curved solid line passes through the five items of desired 
20 output data while avoiding the two items of undesirable output data. 

That is, the learning in the learning apparatus is implemented to decrease the difference between the 
desired output data and the output data, while the learning is implemented to increase the difference 
between the undesirable output data and the output data. 

Accordingly, the learning can be implemented with ease regardless of whether the teaching signal data 
25 is the desired output data or undesirable output data, so that the learning performance can be improved. 
That is, the learning can be implemented according to a learning condition. 

In the first embodiment of the first modification, the loss function r occasionally does not reach the 
finishing condition because the artificial neural network is a non-linear type. In this case, it is preferable that 
a maximum iteration number Wx be added to the finishing condition. That is, the learning for adjusting the 
30 control parameters is stopped when the number of iterations reaches the maximum iteration number twAx 
regardless of whether the finishing condition designated by the equation (21) is satisfied. Thereafter, initial 
values of the control parameters are reset by the operator before the learning is resumed. 

Moreover, either undesirable output data or desired output data is provided to the learning apparatus as 
a set of training example data in the first embodiment. However, it is preferable that both the undesirable 
35 output data and the desired output data be intermingled in the same set of training example data. 

Next, learning methods according to the other embodiments of the first modification are described with 
reference to Figs. 6 to 8. 

Fig. 6 is a graphic view showing a repulsion term r- p of the loss function according to a second 
embodiment of the first modification. {E(O k p -y k p ) 2 } 1/2 is shown by a simple indication |O p -y p j for a Y-axis in 
40 Fig. 6. 

The repulsion term r- p shown in Fig. 6 is defined by utilizing the following equation. 

r- =2 r- p 
p:U 

S r - # ^;fu ex P ( -<^ 0 ^-y»c p ) 2 }^ 2 /P-) —-(22) 

The features of the repulsion term r- p defined by the equation (22) are as follows, 
so A partial derivative of the repulsion term r- p with respect to the difference |O p -y p | between the teaching 
signal data and the output data is large when |O p -y p j is a small value. Therefore, the weight parameters and 
the threshold values are largely renewed so as to increase |O p -y p | in early iterative calculations. That is, the 
output data is rapidly shifted far from the undesirable output data. 

Accordingly, though the speed of learning is fast, the direction of learning is decided in the early 
55 iterative calculations. In other words, the direction of learning is limited in the N-dimensional space. 

Fig. 7 is a graphic view showing a repulsion term r- p of the loss function according to a third 
embodiment of the first modification. {r<O k p -y k p ) 2 } 1/2 is shown by a simple indication |O p -y p | for a Y-axis in 
Fig. 7. 

15 
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The repulsion term r- p shown in Fig. 7 is defined by utilizing the foliowing equation. 



r- =2 r-* 
p:U 



5 




-"(23) 



w 



The features of the repulsion term r- p defined by the equation (23) are as follows. 
A partial derivative of the repulsion term r- p with respect to the difference |O p -y p | between the teaching 
signal data and the output data diverges when |O p -y p | equals the zero value. Therefore, the repulsion 
between the undesirable output data and the output data is large so that the output data is fully shifted far 
15 from the undesirable output data. However, the reliability of the calculation is not superior because the value 
of the repulsion term r- p is rapidly decreased in the early iterative calculations. The extent of the repulsion 
can be adjusted by varying the value n. Moreover, because the value of the repulsion term r- p is 
algebraically decreased when |O p -y p | is decreased, the influence of the undesirable output data is exerted 
on the output data which is positioned far from the undesirable output data as compared with that of the first 
20 and second embodiments in which the repulsion term r- p is exponentially decreased. 

Fig. 8 is a graphic view showing a repulsion term r- p relating to the undesirable output data according 
to a fourth embodiment of the first modification. {E(O k p - Vk p ) 2 } 1/2 is shown by a simple indication |O p -y p | for 
a Y-axis in Fig. 8. 

The repulsion term r- p shown in Fig. 8 is defined by utilizing the following equation. 



The features of the repulsion term r- p defined by the equation (24) are as follows. 
A partial derivative of the repulsion term r- p with respect to the difference |O p -y p | between the teaching 
signal data and the output data diverges when |O p -y p | equals the zero value. Therefore, the repulsion 

35 between the undesirable output data and the output data is large so that the output data is shifted as far as 
possible from the undesirable output data in the same manner as in the third embodiment. However, the 
reliability of the calculation is not superior in the same manner as in the third embodiment. Moreover, the 
value of the repulsion term r- p does not gradually approach the zero value when |O p -y p | is increased. 
Therefore, the finishing conditions must be changed. That is, the finishing condition of the repulsion term r- 

40 =Er- p is judged independent of that of the attraction term r+. In detail, the finishing condition of the 
attraction term r+ is as follows. 

r + < EU (25) 

45 where EL+ is a finishing parameter of the loss function r+. On the other hand, when each repulsion term r- p 
is less than a prescribed positive value, the finishing condition of the repulsion term r- is satisfied. 
Next, a fifth embodiment of the first modification is described. 

Fig. 9 is a block diagram of a learning apparatus which learns by utilizing a back-propagation learning 
method in a 3-layer feedforward artificial neural network according to a fifth embodiment of the first 
so modification. 

As shown in Fig.9, the learning apparatus according to the fifth embodiment comprises; 

the input layer 21 provided with the first neurons 22; 

the hidden layer 23 provided with the hidden neurons 24; 

the output layer 25 provided with the final neurons 26; 
55 the first and second connection sections 27, 28; 

a loss function calculating section 41 for calculating the repulsion term r- p defined by the equation (7), 
(22), (23), or (24) by utilizing both the output data O k p provided from the k-th final neuron 26 and the 
undesirable output data y k p ; 



25 



30 




— (24) 
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the repulsion term calculating section 31 included in the section 41 ; and 

a control parameter adjusting section 42 for adjusting control parameters such as the weight parameters 
W^jj, W° H kj utilized in the first and second connection sections 27, 28 and the threshold values e H jf 0° k to 
decrease the values of the repulsion term r- calculated in the repulsion term calculating section 31 by 
5 utilizing the undesirable output data y k , the input data l„ the output data O k , the weight parameters W^, 
W°% and the threshold values 0 H j, 0° k . 

In the above configuration, the learning apparatus receives the input data lj p and the undesirable output 
data y k p in which the symbol Tp is not included. Thereafter, as shown in Fig. 4, the procedures from the 
step 101 to the step 103 and the procedures from the step 107 to the step 113 are implemented in the 
10 same manner as in the first embodiment. 

Fig. 10 is a graphic view showing the relation between the input data and the output data which is 
obtained by the learning in the learning apparatus according to the fifth embodiment of the first modification 
of the present invention. 

The values of the parameters utilized for learning are the same as those utilized for learning as shown 
75 in Fig. 5. The initial values of the control parameters are the same as those utilized for learning as shown in 
Fig. 5. Moreover, 18 items of undesirable output data are provided, and a large number of items of input 
data of which values are between 0 and 1 .0 are provided to the learning apparatus. Therefore, the output 
data is obtained for each item of input data. 

As shown in Fig. 10, four types of half-finished learning results and a final learning result were obtained. 
20 These results are indicated by solid lines. A first half-finished result is obtained by once implementing the 
procedures from the step 101 to the step 113 (t = 1). A second half-finished result is obtained by iterating 
the procedures 10 times (t = 10). A third half-finished result is obtained by iterating the procedures 100 
times (t = 100). A fourth half-finished result is obtained by iterating the procedures 1000 times (t= 1000). The 
final result is obtained by iterating the procedures 1 1487 times (t = 1 1487). 
25 The repulsion term r- satisfied the finishing condition when the procedures were iterated 11487 times. 

As shown in Fig. 10, the learning result shifted far from all the undesirable output data can be gradually 
obtained by iterating the procedures. Moreover, the undesirable output data close to the output data exerts 
a strong influence on the output data, while the undesirable output data far from the output data does not 
exert a strong influence on the output data. On the other hand, the desired output data generally exerts a 
30 strong influence oh the output data. 

Accordingly, the adjustment of the control parameters such as the weight parameters and the threshold 
values can be learned to increase the difference between the undesirable output data and the output data. 

Next, a sixth embodiment of the first modification is described. 

Generally, a statistical entropy Se is utilized to implement a method for approximating a true probability 
35 distribution pj (i = 1 to m) by utilizing a probability model qi (i = 1 to m) which is defined to assume the true 
probability distribution p { . That is, the statistical entropy Se is defined by the following equation. 

m 

Se «Z {p» «ln(qi /pj ) } (26) 

40 

where pj is the probability of generating an event <a it and Pj> 0 and P1+P2 + — + pm = 1 are satisfied. 
Moreover, q; is an assumed probability to assume the probability p if and qi> 0 and q^ +c&+ — +q m = 1 
are satisfied. 

45 The statistical entropy Se satisfies the following equation according to statistics. 
Se«=? 1/n * In(Pa) (27) 

where Pa is the probability that an assumed distribution of n actual values determined by utilizing the 
so assumed probability model q s is in accord with the true probability distribution pi, 

In short, a model for maximizing Se is the most appropriate model to assume the true probability 
distribution p { . That is, it is possible to provide the most appropriate output data by converting input data. 

When the learning must be implemented by utilizing a limited number of items of training example data, 
it is necessary to select the most appropriate model in which the attraction and repulsion terms r+, r-, the 
55 weight parameters, and the threshold values are included. In this case, the true probability distribution pj is 
equivalent to a distribution of the difference between the teaching signal data and the output data. 
The equation (26) is changed to the following equation. 
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Se = 2 { Pi »ln(qi )} - 2 {p, ♦ln(pj )} —.(28) 

5 Because the second term of the equation (28) depends only on the true probability distribution p it the 

second term can be omitted. 

A logarithmic likelihood l(q) is defined as follows. 

w Kq) e 2 {ni • ln(qi)} (29) 

In this case, the following equation is approved according to the law of large numbers. 

75 nr + na ♦ ♦ n» » n 

lim l(q) « lim [n * Z {pi «ln(qi ) }] (30) 

20 where nj is the number of times that an event o>i is really occurred. 

In short, when a model is selected so as to increase the value of the logarithmic likelihood l(q), the 
model is appropriate because the statistical entropy Se is increased. 

Generally, a model for maximizing the value of the logarithmic likelihood 1(q) is called a model of the 
maximum likelihood procedure in statistics. The model of the maximum likelihood procedure is well-known. 
25 Based on the above well-known theory, the inventor of the present invention has provided a learning 
method and learning apparatus as follows. 

The event a>i is equivalent to the desired output data because it is desired to generate the event «|. On 
the other hand, another true probability distribution 1-pi is defined to estimate the probability that a repulsive 
event n except for the event co t is generated. And, another model 1-qi assuming the true probability 
30 distribution 1-p } is defined. Therefore, the repulsive event n is equivalent to the undesirable output data 
because the generation of the repulsive event u is not desired. 

In this case, the logarithmic likelihood l(q) can be rewritten by adding a term relating to the repulsive 
event as follows. 

l(q) = 2 {ni *ln(Ni #qs /Nb)} + 2[ (Ni ~m ) •ln{N i # (1-q, )/Ne } ] 

— -(31) 

40 

where N 0 is the number of observations, N } is the number of times that the event w\ is payed attention, and 

N 0 =Ni + N 2 + — + N m . 

45 Moreover, N r nj is the number of repulsive events n which are really observed when the event <*\ is payed 
attension. 

In short, the model of the maximum likelihood procedure is determined according to information that 
either the event o>i or the repulsive event n is observed when attention is paid to the event a>j. 

Nj/Nb in the equation (31) is not related to the determination of the model, so Nj/N^ can be deleted, 
so Therefore, the equation (31) is rewritten as follows. 

>n m 

lc(q) * Z {n;«ln(qi)} * 2{ (Nj -n 5 ) •ln(l-q i ) } (32) 

55 

A new logarithmic likelihood l E (q) is then extended to a continuous variable model so that the attraction 
and repulsion terms r+, r- are determined. 

Therefore, the assumed probability distribution qi is replaced with a density distribution function g(x|6) 
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in the continuous variable model. A variable x representing an event is the difference between the teaching 
signal data and the output data, and a variable 9 represents all variable parameters in the model of the 
maximum likelihood procedure. 

Thereafter, a region Sj adjacent to a prescribed variable X| is defined so that the density distribution 
function g(x|9) is integrated in the region Sj to represent a distribution function G<Xj|9) in a discrete variable 
model as follows. 

G(x,je) = /S } g(x|9) dx (33) 

Therefore, the logarithmic likelihood l E (q) is rewritten by utilizing the integrated distribution function G- 
(x,|9). 




"-(34) 

where it is defined that an attractive event x is generated when an event is generated in the region S 
adjacent to the variable x, and it is defined that a repulsive event x R is generated when an event is 
generated in an external region except for the region S. 

Accordingly, the sum of one probability to generate the attractive event x and another probability to 
generate the repulsive event x R equals 1 . 

The shape or the size of the region S is determined according to properties of the teaching signal data. 

Generally, it is preferable that the region S be an hypersphere or an hypercube. Moreover, it is not 
necessary that the region S be closed. 

The logarithmic likelihood l E (6) defined in the equation (34) can be rewritten by considering that the 
calculation is implemented when the event x, or x R is generated. 

Ui€» c p 2 D ln G(xp|0) ♦ ZJn{l-G(x, |e)> 

where p is an index for identifying the teaching signal data. £ P ; D indicates that the addition is made when 
the attractive event x is generated, and £ p:U indicates that the addition is made when the repulsive event x R 
is generated. 

In short, a model for maximizing the logarithmic likelihood l E (9) defined in the equation (35) is the 
model of the maximum likelihood procedure. Therefore, the parameters 9 are adjusted to obtain the 
maximum likelihood model. 

In the present invention, because the loss function r to be minimized by adjusting the control 
parameters is utilized, the loss function r is defined in the sixth embodiment of the first modification as 
follows. 

r = - l E (9) (36) 

To simplify the description of the sixth embodiment, N = 1 is assumed. Moreover, the region S is 
represented for the p-th set of training example data by the following equation. 

Xp-S/2Sx$Xp + s/2 (37) 

A parameter s is the width of the region S and is determined on condition that the shift of the density 
distribution function g(Xp|9) in the region S does not exert an influence on the adjustment of the parameters. 
Therefore, G(x p |9) = s-g(Xp|9) is effected. Moreover, because s.g(Xp|9) equals a very small value in a 
practical calculation, ln{1-s.g(Xp|9)} = -s.g(Xp|9) is effected. Therefore, the logarithmic likelihood l E (9) 
defined in the equation (35) can be rewritten as follows. 



•"-(35) 
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U(0) = Z {In s(Xp|©)) - s»Z g(x P |e) ---(38) 
p : D p : U 

5 Therefore. Ihe loss function r is defined by applying the equations (36), (38) as follows. 

r = - Z {In g(x P |©)> + s»Z e(x P |6) ---(39) 
p : D P : U 

70 

W here constants not depending on the parameter are omitted in the equations (38), (39). 

The density distribution function g(xje) is assumed to be represented by a normal distribution in which 
the mean value is zero and the variance is a 2 . That is, the density distribution function g(x|9) is defined as 
follows. 

76 

g(x|0) = {1/(2* ) 1 M exp{-x' (2o 2 )} (40) 

The attraction and the repulsion terms r+, r- are respectively effected according to the equations (39), 
(40) as follows. 

20 

= Z 

= -Z {In s(x p |©)} 
p:D 

= Z {In a + x p 2/(2a2)} —(41) 
p:D 

= Z r-> 

= s»Z z(x P I©) 
p :U 

Ms/(27r)i'*a}»Z exp{-x P * / (2a* ) } —(42) 
p:U 

O p i-y p i is effected by utilizing both the output data O p i and the teaching signal data y p i, and 
the parameter s designating the width of the region S is a parameter for adjusting the ratio of the 
contribution of the undesirable output data to the contribution of the desired output data. 

The attraction and repulsion terms r+, r- are respectively minimized by adjusting the control parameters 
W^ji, W° H kj , 0 H j, and 6° k and the variance a 2 in the learning apparatus shown in Fig. 2. 
40 Fig. 1 1 is a flowchart showing a learning method performed in the learning apparatus shown in Fig. 2 
according to the sixth embodiment of the first modification. 

As shown in Fig. 11, a parameter p designating a set number of items of training example data {lj p , y k p , 
Tp} composed of the teaching signal data y k p , the input data l| p , and the symbol Tp is set to 1 (p= 1) in a 
step 201 when the training example data {lj p , y k p , Tp} is provided to the learning apparatus by an operator. 
45 Thereafter, L items of input data lj p are provided to the first neurons 22, and N items of teaching signal data 
y k p and the symbol Tp are automatically transmitted to the loss function calculation section 29 in a step 
202. 

Thereafter, N items of output data O k p are calculated in the 3-layer feedforward artificial neural network 
by applying the equations (1) to (5) before the output data O k p is transmitted to the loss function calculation 
so section 29 in a step 203. 

Thereafter, whether the teaching signal data y k p is the desired output data or the undesirable output 
data is judged by inspecting the symbol Tp in a prescribed control section (not shown) in a step 204. That 
is, in cases where Tp indicates the symbol + , the teaching signal data y k p transmitted to the section 29 is 
judged to be the desired output data. On the other hand, in cases where Tp indicates the symbol -, the 
55 teaching signal data y k p transmitted to the section 29 is judged to be the undesirable output data. 

In cases where the teaching signal data y k p is judged to be the desired output data in the step 204, a 
value of the attraction term r+ p is calculated in the attraction term calculating section 30 by utilizing the N 
items of desired output data according to the equation (41), and the calculated value is stored in a 

20 
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prescribed memory (not shown) of the section 30 in a step 205. Thereafter, the control parameters W^,, 
W° H kj , 6% and $° k utilized in the first and second connection sections 27, 28 are transmitted to the control 
parameter adjusting section 32 so that partial derivatives dr^/aw"^, dr* p /dW° H kj, dr+ p /de H j, dr* p /de° k , and 
dr* p /do are calculated by utilizing the value r* p stored in the section 30 in the control parameter adjusting 
5 section 32. Then, renewed values AW^, AW° H kjl Ae H j, A0° k , and Ao are calculated to minimize the 
attraction term r* p as follows. 

AW° H kj (t + 1) = -(l-aC^r^/dW 0 ^ + a.AW° H kj (t) (43) 
to AW^t + l) = -(l-a^T^r^/aW^ + a^'jilt) (44) 
A0 H j(t+1) = -(1^a)^-6r* p /d0 H j + a-A0 H j(t) (45) 
. A0° k (t + 1) = -(l-orJ^.dr^P/de^ + a-A0° k (t) (46) 

75 

Ao(t+1) = -(l-a^-dr+P/do + o-Aa(t) (47) 

A method of minimization based on the above equations (43) to (47) is well known as the probabilistic- 
descent method in a probability theory (Amari, S ,:IEEE Trans. EC-16, 279(1967)). 

20 Thereafter, renewed control parameters W^j, W 0 "^, e H j, e° k , and a are calculated by applying the 
equations (1 7) to (20) and stored in a memory (not shown) of the section 32 in a step 206. The calculation 
for finding a is the same as that of the other control parameters. 

In cases where the teaching signal data y k p is judged to be the undesirable output data in the step 204, 
a value of the repulsion term r- p is calculated in the repulsion term calculating section 31 by utilizing the N 

25 items of undesirable output data according to the equation (42). and the calculated value is stored in a 
prescribed memory (not shown) of the section 31 in a step 207. Thereafter, the control parameters W^, 
W° H kj , 0 H j. and $° k utilized in the first and second connection sections 27, 28 are transmitted to the control 
parameter adjusting section 32 so that partial derivatives dr-P/dW^j, dr- p /dW° H kj , dr- p /de H jt dr- p /de° k , and 
Ao are calculated by utilizing the values r- p stored in the section 31 in the control parameter adjusting 

30 section 32. Then, renewed values AW^, AW° H kj , Ae H ]f A0° k , and Ao are calculated to minimize the 
attraction term r- p as follows. 

AW° H kj (t+1) = -(1-a)Vdr- p /dW° H kj + a.AW° H kj <t) (48) 
35 AW^jitf + l) = -(l-ar^r-^dW^ + a-AW^Kt) (49) 
A0 H j(t + 1) = -(l-ar^r-fYd^j + a.A0 H j(t) (50) 
A8° k (t + 1) = -(1-a)Vdr- p te<?° k + a*A0° k (t) (51) 

40 

Ao(t + 1) = -(1-cr)*7,*dr- p /do + a.Ao(t) (52) 

Thereafter, renewed control parameters W^, W° H kj , 0 H jf e° kt and Ao are calculated by applying the 
equations (17) to (20) and stored in a memory (not shown) of the section 32 in a step 208. 

45 Thereafter, whether or not the parameter p equals the number Q of training example data is judged in 
the prescribed control section in a step 209. In cases where the parameter p is judged not to equal the 
number Q, the parameter p is incremented by 1 in a step 210 so that the procedures from the step 202 to 
the step 209 are repeated until the attraction and repulsion terms r+ p , r- p are stored throughout all the sets 
of training example data in the memory of the sections 29, 32. 

so In cases where the parameter p is judged to equal the number Q in the step 209, the judgement means 
that all sets of training example data have been calculated to provide Q sets of output data for the section 
29. Moreover, the judgement means that the attraction and repulsion terms r+ p , r- p are stored throughout all 
the sets of training example data in the memory of the sections 29, 32. Therefore, the value of the loss 
function r is found by utilizing the attraction and repulsion terms r+ p , r- p stored in the memory of the 

55 section 29 according to the equation (8) in a step 21 1 . 

Thereafter, whether or not the value of the loss function r satisfies the finishing condition is judged by 
applying the equation (21) in a step 212. That is, in cases where the value of the loss function r is equal to 
or greater than E L , the finishing condition is not satisfied. Therefore, the procedures from the step 201 to the 
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step 212 are repeated. 

The repetition of the procedures from the step 201 to the step 212 is continued until the finishing 
condition is satisfied. 

in cases where the value of the loss function r is less than E L , the finishing condition is satisfied in the 
5 step 212 so that the learning in the learning apparatus is completed. That is, the control parameters such as 
the weight parameters, the threshold values, and the variances have been adjusted to the most appropriate 
values. 

Accordingly, after the control parameters are adjusted to the most appropriate values, the operator can 
accomplish the most appropriate diagnosis, the most appropriate robot control, or the like without utilizing 
io the section 29. 

Moreover, the learning can be smoothly implemented in the same manner as in the first embodiment of 
the first modification. 

Because the variance a 2 is adjusted, the application of the learning method utilizing the undesirable 
output data can be enlarged. 
15 In cases where N is larger than 1, the density distribution function g(x|0) is represented by an N- 

dimensional normal distribution. 

In the sixth embodiment of the first modification, the density distribution function g(x|6) is defined 
according to the equation (40). However, it is preferable that the function g(x|9) utilized in early iterative 
calculations be halfway replaced with a desired function. The desired function is detemined by referring the 
20 actual difference between the teaching signal data and the output data which is obtained by iteratively 
calculating the loss function. 

Moreover, N items of teaching signal data corresponding to a p-th set of input data in the N dimensions 
are either N items of desired output data or N items of undesirable output data in the sixth embodiment. 
However, it is preferable that both the desired output data and the undesirable output data be intermingled 
25 in the N items of teaching signal data. 

Specifically, in cases where an N-dimensional space is divided between an L A dimensional space S A 
and an Lb dimensional space S B , either L A items of desired or undesirable output data are provided in the 
space S A andeither Lb items of desired or undesirable output data are provided in the space Sb. 

Moreover, the density distribution function g A (x A |0 A ) is assumed in the space S A , and the density 
30 distribution function g B (x B |e B ) is assumed in the space S B . And, the integrated distribution function G A - 
(Xp A |G A ) is assumed in the space S A , and the integrated distribution function G B (x p B |9 B ) is assumed in space 
S B . 

In this case, the loss function r is effected by applying the equations (35), (36) as follows. 

35 r « - 2 ln{G* Up* |e* )*G B (x P * |e B )} 

p : DD 

- 2 ln[G* Up* |0«W1-G* (x P B I©* )}] 
p:DU 

- 2 ln[{l-GA<XpA|e«)}»G*(Xp*|© B )] 
40 p:UD 

- 2 In [ { 1-G* (Xp* | ©* ) }*{1-G* (x p B |© 8 ) } ] — - ( 53 ) 

• 111 I 



45 where p:DD indicates that the desired output data is provided in both the spaces S A , S B , p:DU indicates that 
the desired output data is provided in the space S A and the undesirable output data is provided in the 
space S B , p:UD indicates that the undesirable output data is provided in the space S A and the desired 
output data is provided in the space S B , and p:UU indicates that the undesirable output data is provided in 
both the spaces S A , Sb« 

so The loss function r defined by the equation (53) is rewritten by applying the equation (39) as follows. 

r « - 2 ln{SA*g* (x* |© A ) -21n{s B *g* (x B | & ) } 
p:D p:D 

65 + Sa 2g B (x B j© B ) + sb 2 E*(xM©«) 

p:U p:U 

= r* A ♦ r* B * r-A *r- B (54) 
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The parameter 9 in the sixth embodiment represents the weight parameters W^j, W° H m, the threshold 
values 0 H j,e° k , and the variance a 2 . Therefore, the variance a 2 is, for xample, slightly renewed as follows. 

Ao A (t+1) = -(1-a)***r+ A feo + a*Aa A (t) (55) 

Ao B (t+1) = -(1-a)*j>*dr+ B /ba + a-Aa B (t) (56) 

Ao A (t+1) = -(1-a)*i»*dr- A /do + a-Ao A (t) (57) 

Aa B (t+1) = -(1-a)*,^r- B /6o + a*Ao B (t) (58) 

Accordingly, the learning can be implemented by minimizing the loss function r regardless of whether 
the desired output data and the undesirable output data are intermingled in the same set of teaching signal 
data. 

Next, a seventh embodiment of the first modification is described with reference to Figs. 12, 13. 
In the seventh embodiment, the probability density function g(x|9) defined by the equation (40) is 
replaced with a generalized Gaussian distribution function Ga(x|9). 
That is, the function Ga(xje) is defined as follows. 

Ga(x|9) = {2.b 1/b .a.r(1/b + 1)}- 1 *exp{-|x| b /(b*o b )} (59) 

where TO is a Gamma function, a and b are parameters. 

The parameter b is added to the control parameter G as compared with the sixth embodiment. 

Fig. 12 is a graphic view of the generalized Gaussian distribution function Ga(x|G). 

As shown in Fig. 12, generalized Gaussian distributions are shown in cases of b = 1, b = 2, b = 10. and 
b = oo. 

In cases where the parameter b equals 2 (b = 2), the function Ga(x|9) is in accord with the function g- 
(x|9) defined by the equation (40). In other words, in cases where the distribution of the difference x 
between , the teaching signal data and the output data deviates from the normal distribution g(x|9), the 
learning can be implemented by utilizing the function Ga(x|9) in the seventh embodiment. 

A loss function r according to the seventh embodiment is defined by the following equation (60) in the 
same manner as in the sixth embodiment in which the loss function r is defined by the equation (39). 



r * 




---(eo) 



The loss function r defined by the equation (60) is rewritten by substituting the function Ga(x|9) defined 
by the equation (59) for the equation (60) after x p = O k p -y k p is substituted in the equation (59). 



r = r+ + r- 



(61) 
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(62) 



(63) 
(64) 

In cases where N is larger than 1 , the distribution function Ga(x|0) is represented by an N-dimensionai 
20 generalized Gaussian distribution distribution. 

The learning is implemented by renewing the control parameter 9 for each set of training example data. 
Therefore, a renewed value of the control parameter 9 is found in the same manner as in the sixth 
embodiment as follows. 

In cases where the teaching signal data yi p is the desired output data. 

25 

A9(t+1) = -(1-a).n*dr + p /e>e + a.A9(t) (65) 

where the control parameter 9 represents the weight parameters W^jj.W 0 ^, the threshold values e H j, e° k , 
and the parameters o, b. 
30 In cases where the teaching signal data yi p is the undesirable output data, 

A0(t+1) = -(1-a).7,.dr- p /d0 + a*A0(t) (66) 



35 Therefore, the renewed control parameter 9 is found as follows. 

9(t + 1) = 9(t) + A9(t+1) (67) 

The learning is implemented by being divided into two phases. The parameter b is set at 2 (b = 2) in a 
40 first phase, so that the learning in the first phase is continued until the value of the loss function r decreases 
to less than a certain small value E L1 . After the first phase is accomplished, the control parameter 9 
including the parameter b is re-adjusted in a second phase. The learning in the second phase is continued 
until the value of the loss function r decreases to less than a certain small value E^. 

The above learning for renewing the control parameter 9 and for minimizing the loss function r is 
45 implemented by utilizing the probabilistic-descent method in the same manner as in the sixth embodiment. 

Fig. 13 is a flowchart showing a learning method performed in the learning apparatus shown in Fig. 2 
according to the seventh embodiment of the first modification. The learning method is implemented in case 
of N = 1 in the seventh embodiment. 

As shown in Fig. 13, the first phase is set and the parameter b is set to 2 in a step 301. Therefore, 
so phase = 1 and b = 2 are effected. Thereafter, a parameter p designating a set number of training example 
data {lj p , yi p , Tp} composed of the teaching signal data yi p , the input data lj p , and the symbol Tp is set to 
1 (p = 1) in a step 302 when the training example data {lj p , yi p , Tp} is provided to the learning apparatus by 
an operator. Thereafter, L items of input data If, are provided to the first neurons 22, and the teaching 
signal data yi p and the symbol Tp are automatically transmitted to the loss function calculating section 29 
55 in a step 303. 

Thereafter, output data Oi p is calculated in the 3-layer feedforward artificial neural network by applying 
the equations (1) to (5) before the output data Oi p is transmitted to the loss function calculation section 29 
in a step 304. 
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Thereafter, whether the teaching signal data yi p is the desired output data or the undesirable output 
data is judged by inspecting the symbol Tp in a prescribed control section (not shown) in a step 305. That 
is, in cases where Tp indicates the symbol +, the teaching signal data yi p transmitted to the section 29 is 
judged to be the desired output data. On the other hand, in cases where Tp indicates the symbol the 
teaching signal data y t p transmitted to the section 29 is judged to be the undesirable output data. 

In cases where the teaching signal data yi p is judged to be the desired output data in the step 305, a 
value of the attraction term r+ p is calculated in the attraction term calculating section 30 by utilizing the 
desired output data according to the equations (62), (64) and the calculated value is stored in a prescribed 
memory (not shown) of the section 30 in a step 306. Thereafter, the control parameters W^j,, W° H kj , 0 H jf e° kt 
and a utilized in the first and second connection sections 27, 28 are transmitted to the control parameter 
adjusting section 32 so that partial derivatives dr^/dW^, ar + p /dW° H kj , dr^/ae",, ar* p /ae°k, and 6r + p /da are 
calculated by utilizing the value r* p stored in the section 30 in the control parameter adjusting section 32 
and are stored in a memory (not shown) of the section 32 in a step 307. In addition, renewed values AWV 
(t), AW° H kj (t) Ae H ,(t), A0° k (t) and Ao(t) are calculated by applying the equation (65) so that the control 
parameters 0 {Vv^^t), W° H kJ (t) e H j(t), 0° k (t) and o(t)) are slightly renewed by applying the equation (67). 
That is, renewed control parameters W^rft+I), W° H kj (t + 1), e H j(t + 1), 0° k (t + 1), and o(t+1) are determined 
in the step 307. 

In cases where the teaching signal data yi p is judged to be the undesirable output data in the step 305, 
a value of the repulsion term r- p is calculated in the repulsion term calculating section 31 by utilizing the 
undesirable output data according to the equations (63), (64), and the calculated value is stored in a 
prescribed memory (not shown) of the section 31 in a step 308. Thereafter, the control parameters W^, 
W° H kj , 0 H j, 0° k , and a utilized in the first and second connection sections 27, 28 are transmitted to the 
control parameter adjusting section 32 so that partial derivatives dr-P/feWji, ar- p /dW° H kj , ar- p /ae H j, 
dr- p /d0° k , dr- p /do are calculated by utilizing the values r- p stored in the section 31 in the control parameter 
adjusting section 32 and are stored in the memory of the section 32 in a step 309. In addition, renewed 
values AW^tt), AW° H kj (t) Ae H j(t). Ae° k (t) and Aa(t) are calculated by applying the equation (66) so that the 
control parameters 6 {W^t). W° H kj (t) 0 H j(t), e° k (t) and a(t)} are slightly renewed by applying the equation 
(67). That is, renewed control parameters W^f+I), W° H kj (t+1). 0 H j(t + 1), e° k (t + 1), and o(t+1) are 
determined in the step 309. 

Thereafter, whether or not the parameter p equals the number Q of items of training example data is 
judged in the prescribed control section in a step 310. In cases where the parameter p is judged not to 
equal the number Q, the parameter p is incremented by 1 in a step 311 so that the procedures from the 
step 303 to the step 310 are repeated until the attraction and repulsion terms r+ p , r- p and the partial 
derivatives dr^/aw"^, ar + p /dW° H k] , 6r+*/60 H lt dr + p /60° k . dr-Wj,, ar- p /aW° H kj , ar-P/ae",, 
dr- p /a0° k , dr+ p /do t and dr- p /a<j are stored throughout all the sets of training example data in the memory of 
the sections 29. 32. In addition, the renewed values of the control parameres and the renewed control 
parameters are stored throughout all the sets of training example data. 

In cases where the parameter p is judged to equal the number Q in the step 310, the judgement means 
that all sets of training example data have been calculated to provide Q sets of output data to the section 
29. Moreover, the judgement means that the attraction and repulsion terms r + p , r- p and the partial 
derivatives dr^/aW™,,, ar + p /aW° H kj , dr + p /ae H j, ar + p /d0° k , ar^/aw^,, dr- p /dW° H kj . ar- p /a<?V 
dr- p /d0° k dr + p /do, and ar- p /da are stored throughout all the sets of training example data in the memory of 
the sections 29, 32. In addition, the judgement means that the renewed values of the control parameres and 
the renewed control parameters are stored throughout all the sets of training example data. Moreover, the 
value of the loss function r is then found by utilizing the attraction and repulsion terms r + p , r- p stored in the 
memory of the section 29 according to the equations (61) to (64) in a step 312. 

Thereafter, whether or not the value of the loss function r satisfies a first finishing condition is judged in 
a step 313. That is, in cases where the value of the loss function r is equal to or greater than E L i. the first 
finishing condition is not satisfied, so the iteration number t is incremented by 1 in a step 314. Thereafter, 
the procedures from the step 302 to the step 31 3 are iterated while utilizing the renewed control parameters 
W^t + l), W° H kj (t + l), e H j(t + i), 0° k (t+l), and c(t + i) in place of the control parameters W^t), W° H kj (t), 
* H j(t), e° k (t), and a(t). 

The iteration of the procedures from the step 302 to the step 314 is continued until the first finishing 
condition is satisfied. 

In cases where the value of the loss function r is less than Eu, the first finishing condition is satisfied in 
the step 313 so that the second phase is set in a step 315. That is, phase = 2 is effected and the parameter 
b is handled as one of the control parameters. After the second phase is set. whether or not the value of the 
loss function r satisfies a second finishing condition is judged in a step 316. That is, in cases where the 
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value of the loss function r is equal to or greater than El2. the second finishing condition is not satisfied so 
that the procedures from the step 302 to the step 316 are iterated until the second finishing condition is 
satisfied. In this case, the partial derivatives of the control parameters including dr+ p /db, dr- p /db are 
renewed and stored in the steps 307, 309. 
5 When the second finishing condition is satisfied in the step 316, the learning in the learning apparatus is 

completed. That is, the control parameters are adjusted to the most appropriate values. 

In cases where the learning is implemented by utilizing the normal distribution g(x p je), as well known, 
the learning can be implemented at highest speed by applying the random fall method in a probability 
theory. 

w However, in cases where the control parameters o, b must be adjusted, the iterative calculations 
becomes complicated, and the calculation time is largely increased. Therefore, the learning in the seventh 
embodiment is divided into two phased. That is, the parameter b is fixed so that the calculation time can be 
largely decreased in the first phase. And, because the control parameters W"^, \N° H q> 0", e° k , and a are 
almost adjusted to the most appropriate values, the learning in the second phase is rapidly converged so 

75 that the second finishing condition can be rapidly satisfied. 

Accordingly, after the control parameters are adjusted to the most appropriate values, the operator can 
accomplish the most appropriate diagnosis, the most appropriate robot control, or the like without utilizing 
the section 29. 

Moreover, the learning in the seventh embodiment is generalized as compared with the sixth embodi- 
20 ment. 

The second phase is continued in the seventh embodiment until the second finishing condition is 
satisfied. However it is preferable that the procedures from the step 302 to the step 315 be repeated 
several times before the procedures from the step 302 to the step 315 be repeated by utilizing the fixed 
parameter b until the second finishing condition is satisfied. 
25 Next, a learning method and learning apparatus according to a second modification of the present 
invention is described. 

In the second modification, the teaching signal data provided to a learning apparatus is the desired 
output data, the undesirable output, or boundary specifying data. The boundary specifying data specifies a 
boundary surface which divide a desired region from an undesired region in N-dimensional space. Output 
30 data provided from the learning apparatus is expected to be positioned within the desired region and is 
expected not to be positioned within the undesired region. 

Fig. 14 is a block diagram of a learning apparatus according to the second modification of the present 
invention, the apparatus being conceptually shown. 

As shown in Fig. 14, the learning apparatus according to the second modification, consists of; 
35 the input terminal 11; 

the conversion section 12; 

a plurality of output terminals 51 for re-converting the intermediate data converted in the conversion 
section 12 according to a prescribed calculation and for providing N items of output data O k (k = 1 to N) 
which represent an N-dimensional output vector indicating output coordinates; 
40 a loss function calculating section 52 for calculating a loss function by utilizing the teaching signal data 
which is provided by an operator; 

the attraction term calculating section 15 included in the section 52; 

the repulsion term calculating section 16 included in the section 52; 

a boundary term calculating section 53 included in the section 52 for calculating a boundary term r> of 
45 the loss function by utilizing both the output data O k provided from the output terminals 51 and the 
boundary specifying data, the value of the boundary specifying data being decreased when the output 
coordinates indicated by the N items of output data O k are shifted towards the desired region from the 
undesired region divided by the boundary surface which is specified by the boundary specifying data; and 
a control parameter adjusting section 54 for adjusting the control parameters Wj utilized in the 
50 conversion section 12 to decrease values of the attraction, repulsion, and boundary terms r+, r-, and r> 
calculated in the attraction, repulsion, and boundary terms calculating section 15, 16, 53 by utilizing the 
teaching signal data, the input data, the output data, and the control parameters. 

The output data O k is preferably close to the desired output data and should not be close to the 
undesirable output data in the same manner as in the first modification. Moreover, it is desired that the 
55 output coordinates indicated by the N items of output data O k be shifted from the undesired region to the 
desired region. 

In the above configuration, in cases where the desired output data or the undesirable output is provided 
to the learning apparatus, the output data is provided to the attraction or repulsion term calculating section 
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15, or 16 so that the value of the attraction or repulsion term r*, or r- is calculated In the same manner as 
in the first modification. Thereafter, the control parameters are adjusted so as to decrease the value of the 
attraction or repulsion term r*. or r- in the control parameter adjusting section 54 in the same manner as in 
the first modification. 

On the other hand, in cases where the boundary specifying data is provided to the learning apparatus, 
the output vector represented by the N items of output data O k is provided to the boundary term calculating 
section 53. The value of the boundary term r> is then calculated by utilizing both the output vector and the 
boundary specifying data. Thereafter, the control parameters are adjusted to decrease the value of the 
boundary term r> in the control parameter adjusting section 54. 

Accordingly, in cases where the boundary specifying data is provided to the learning apparatus, the 
output vector can be positioned within the desired region because the value of the boundary term r> is 
decreased. 

Next, a learning apparatus and a learning method for performing the learning of the learning apparatus 
according to a first embodiment of the second modification are described with reference to Figs. 15 to 19. 

Fig. 15 is a block diagram of a learning apparatus for learning by utilizing the back-propagation learning 
method in a 3-layer feedforward neural network according to the first embodiment of the second 
modification. 

As shown in Fig. 15, the learning apparatus for learning by utilizing training example data composed of 
both teaching signal data (y k , a*) (k = 1 to N) and input data I, (i = 1 to L) which is provided by an operator in 
a 3-layer feedforward neural network, consists of: 

the input layer 21 provided with the first neurons 22; 

the hidden layer 23 provided with the hidden neurons 24; 

the output layer 25 provided with the final neurons 26; 

the first connection section 27; 

the second connection section 28; 

a loss function calculating section 61 for calculating a value of a loss function relating to the teaching 
signal data (y k . a*) by utilizing the output data A provided from k-th final neurons 26 and the teaching 
signal data corresponding to the output data O k , the teaching signal data (y k , a k ) being desired output data, 
undesirable output data, or boundary specifying data; 

the attraction term calculating section 30 included in the section 61 ; 

the repulsion term calculating section 31 included in the section 61 ; 

a boundary term calculating section 62 included in the section 61 for calculating the value of a 
boundary term r> of the loss function by utilizing both the output data O k and the boundary specifying data 
(Yk. aO, the value of the boundary term r> being decreased when the output data O k is shifted towards a 
desired region from an undesired region which are divided by a boundary surface designated by the 
boundary specifying data; and 

a control parameter adjusting section 63 for adjusting control parameters such as the weight parameters 
W^i, W°\ utilized in the first and second connection sections 27, 28 and the threshold values e H lt $° k to 
decrease values of the terms r+, r-, and r> calculated in the section 30, 31, and 62 by utilizing the teaching 
signal data (y k , a*), the input data l j( the output data O k , the weight parameters W"'.,, W° H ki , and the 
threshold values 0 H j,e° k . 

The 3-layer feedforward neural network is composed of the input layer 21, the hidden layer 23, and the 
output layer 25. 

Relational equations for finding the output data O k by utilizing the input data I, are defined in the 
equations (1) to (5). 

In the second modification, y k included in each item of teaching signal data designates a meaningful 
physical quantity such as temperature or pressure in cases where the teaching signal data is either the 
-desired output data or the undesirable output data. Therefore, a,, included in each item of teaching signal 
data is ignored in cases where the teaching signal data is either the desired output data or the undesirable 
output data. 

Moreover, the attraction term r + relating to the desired output data is defined by the equation (6) and 
the repulsion term r- of the loss function is defined by the equation (7) in the same manner as in the first 
embodiment of the first modification. 

On the other hand, in cases where theteaching signal data is the boundary specifying data, N items of 
output data O k represent an output vector O in an N-dimensional space. That is, 

"0 = (Oi t 0* . — , On ) 
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is effected. Moreover, N items of boundary specifying data y k , a k represent boundary vectors y\ 7 in the 
N-dimensional space. That is, 

(yi , 72 . — . yn) and"v*« (at t a 2 . ♦ a N ) 

5 

are effected. The boundary surface in the N-dimensional space is defined by utilizing the boundary vectors 
7, v. 

Fig. 16 shows the boundary surface specified by the boundary vectors y , v in a three-dimensional 
io space. 

As shown in Fig. 16, the boundary vector 7 gives a normal vector of the boundary surface to specify a 
plane direction of the boundary surface. Moreover, the direction of the boundary vector V is directed 
towards the desired region from the undesired region. The boundary surface specified the plane direction 
by the boundary vector V is given a passing point by the boundary vector "y . That is, the boundary surface 
75 passes the passing point. 

The repulsion term r> relating to the boundary specifying data is defined by utilizing an inner product 

d 

determined by both the output vector O and the boundary vectors 7. as follows. 
25 r> = 1/{1 + exp(d)} (68) 

Fig. 17 shows the relation among the vectors O, 7. and 7 in the 3 dimensional space. 

As shown in Fig. 17, the value of the inner product d is equivalent to the distance between the 
boundary surface and the output coordinates designated by the output vector O on the assumption that the 
30 boundary vector \f is a unit vector. Moreover, the distance is a negative value when the output coordinates 
are positioned in the undesired region, and a positive value when the output coordinates are positioned in 
the desired region. ^ 

Therefore, the value of the inner product d is increased when the output vector O is shifted in the 
direction specified by the boundary vector v\ 
35 Fig. 18 is a graphic view of the relation between the boundary term r> and the value of the inner 
product d. ^ 

As shown in Fig. 18, the value of the boundary term r> is decreased when the output vector O is shifted 
to the desired region because the value of the inner product d is increased. 

In the above configuration, Q sets of training example data composed of both the input data and the 
40 teaching signal data are provided to the learning apparatus in turn. Therefore, parameters, terms, data, and 
the like are indicated by attaching a letter p as follows. 
IjP, O k P, r*P , r-P , r>P, y k P a^, dP 

For example, the boundary term r> p relating to the p-th boundary specifying data is indicated as follows. 

45 r>p = 7> • 1'{ + exp(d p )} (69) 

where y> is a positive parameter for adjusting the influence of the boundary specifying data on the output 
vector 

50 0* =(0i> . Oz* Oh>). 

The boundary term r> relating to Q sets of teaching signal data is defined as follows. 
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r> ■ 2 r> > 
p:B 

= r> *?n 1/ < 1 + exp(dM) —(70) 

5 P • o 

where L p:B indicates the sum of all boundary terms r> p . 

Moreover, the teaching signal data is distinguished by utilizing a symbol Tp (p = 1 to Q). That is, the p- 
th set of teaching signal data y k p , is the desired output data when Tp indicates a symbol + , the p-th set 
10 of teaching signal data y k p , is the undesirable output when Tp indicates a symbol and the p-th set of 
teaching signal data y k p , a* p is the boundary specifying data when Tp indicates a symbol >. 

When the p-th set of training example data is provided to the learning apparatus, k items of output data 
O k p are obtained by applying the equations (1) to (5) so that the value of the term r* p , r- p or r> p is 
calculated by applying the equation <6) f (7). or (69). Therefore, the loss function r relating to Q sets of 
75 teaching signal data is defined as follows. 

r r T >«> 
P 

20 *(1/C3^(r. * r - ♦ r > ) 

P:D p:U p:B (71) 

25 Therefore, because the values of the terms r + , r-, and r> are respectively decreased to adjust the 
parameters and the threshold values, the parameters and the threshold values can be adjusted to the most 
appropriate values by decreasing only the total loss function r regardless of whether the desired output 
data, the undesirable output data, and the boundary specifying data are intermingled in the Q set of training 
example data. 

30 Generally, the contributory ratio among the desired output data, the undesirable output, and the 
boundary specifying data is shifted according to the properties of the training example data. Therefore, the 
contribution ratio of the undesirable output data to the other learning data is adjusted by the parameter y - in 
the same manner as in the first embodiment of the first modification. Moreover, the contribution ratio of the 
boundary specifying data to the other training example data is adjusted by the parameter 7 >. That is, when 

35 the parameter y> is increased, the boundary term r> is increased so that the influence of the boundary 
specifying data is increased. 

An absolute value of the boundary vector V is considered as a parameter for adjusting the influence of 
the boundary specifying data on the output data. 

Next, the feature of the boundary term r> relating to the boundary specifying data is described. 

40 As shown in Fig. 18, because a partial derivative of the boundary term r> with respect to the inner 
product d p gradually approaches the zero value as the distance between the boundary surface and the 
output vector becomes larger in the undesired region, a large number of iterated calculations are required 
to bring the output data within the desired region when the output vector is deeply positioned within the 
undesired region. 

45 However, because the value of the boundary term r> is gradually shifted, the learning of the learning 
apparatus can be stably implemented. That is, the output vector is reliably shifted within the desired region. 
In addition, because the shape of the boundary term r> is simple, it is easy to analytically handle the 
boundary term r>. 

Moreover, because the value of the boundary term r> is exponentially decreased, the influence of the 
so boundary specifying data is exerted on the limited output vectors which are positioned close to the 
boundary surface. Therefore, the output vectors which have already been shifted within the desired region 
are not influenced so much so that the iterative calculations are implemented for only the other output 
vectors which are still positioned within the undesired region. Accordingly, the iterative calculations are 
guaranteed to be stably implemented. 
55 Next, a learning method for implementing the learning of the learning apparatus to adjust the weight 
parameters W^, W° H kj , and the threshold values $ H lt $° k to the most appropriate values by utilizing the 
terms r* p , r- p . and r> p in cases where Q sets of training example data are provided to the learning 
apparatus is described. 



29 



Wril> <FP 0492&41A2 I > 



10 



15 



EP 0 492 641 A2 

In the learning method according to the first embodiment of the second modification, partial derivatives 
dr/dW^jj, dr/dW° H kj , dr/d0 H j, and dr/de° k are calculated as follows. 

3r/3W«i j i =(L/qiWs Or»»/8Wi"ji + 3r.p/3WHiji ♦ 3r>>/3WHtjj) 
P (72) 

3r/3W°H k j^L/Qj»2 (9r»»/aw° M nj * 3r-P/3W 0H vi + 3r> p /3W°H^j 



3r/30 H j =a/Q&»E (3r*>/3e H j ♦ 3r-P/3fi H j ♦ 3r>P/3eHj) (74) 

P 

3r/30°k<l/Q>Z (3r^/38°k * 3r-*/3e°k + 3r>>/30°k) (75) 



The renewed values AW^.AW 0 ^, A0 H j, and A0° k are obtained according to the equations (10), (12), 
(14), and (16) in the same manner as in the first embodiment of the first modification. 

20 Therefore, renewed weight parameters W^t + I), W° H kj (t + 1) and the renewed threshold values 0 H r 
(t+1), 0° k (t + 1) are found according to the equations (17) to (20) in the same manner as in the first 
embodiment of the first modification. 

Fig. 19 is a flowchart showing a learning method performed in the learning apparatus shown in Fig. 14 
according to the first embodiment of the second modification. 

25 As shown in Fig. 19. a parameter p designating a set number of items of training example data {If, a k p , 
Yk p . Tp} composed of the teaching signal data anf, y k p , the input data If, and the symbol Tp is set, to 1 
(p=1) in a step 401 when the training example data {If, a^, y k p , Tp} is provided to the learning apparatus 
by an operator. Thereafter, L items of input data If are provided to the first neurons 22, and N items of 
teaching signal data a^, y k p and the symbol Tp are automatically transmitted to the loss function calculation 

30 section 61 in a step 402. 

Thereafter, N items of output data O k p are calculated in the 3-layer feedforward neural network by 
applying the equations (1 ) to (5) before the output data O k p is transmitted to the loss function calculation 
section 61 in a step 403. 

Thereafter, whether the teaching signal data a k p , y k p is the desired output data, the undesirable output 

35 data, or the boundary specifying data is judged by inspecting the symbol Tp in a prescribed control section 
(not shown) in a step 404. That is, the teaching signal data a k p y k p transmitted to the section 61 is judged 
to be the desired output data in cases where Tp indicates the symbol + , and the teaching signal data a k p , 
y k p transmitted to the section 61 is judged to be the undesirable output data in cases where Tp indicates 
the symbol On the other hand, the teaching signal data a k p , y k p transmitted to the section 61 is judged to 

40 be the boundary specifying data in cases where Tp is the symbol >. 

In cases where the teaching signal data a^, y k p is judged to be the desired output data in the step 404, 
the value of the attraction term r + p is calculated in the attraction term calculating section 30 by utilizing the 
N items of desired output data according to the equation (6) and the calculated value is stored in a 
prescribed memory (not shown) of the section 30 in a step 405. Thereafter, the control parameters W^j, 

45 W° H kj , 0 H j, and 0° k utilized in the first and second connection sections 27, 28 are transmitted to the control 
parameter adjusting section 63 so that partial derivatives dr+PfcW 1 ^,, dr + p /dW° H kj , dr + p /de H j, and dr + p /d0° k 
are calculated by utilizing the value r + p stored in the section 30 in the control parameter adjusting section 
63 and are stored in a memory (not shown) of the section 63 in a step 406. 

In cases where the teaching signal data ajf, y k p is judged to be the undesirable output data in the step 

so 404, a value of the repulsion term r- p is calculated in the repulsion term calculating section 31 by utilizing 
the N items of undesirable output data according to the equation (7) and the calculated value is stored in a 
prescribed memory (not shown) of the section 31 in a step 407. Thereafter, the control parameters W^, 
W° H kj , 0 H j, and 0° k utilized in the first and second connection sections 27, 28 are transmitted to the control 
parameter adjusting section 63 so that partial derivatives dr^/aW*^, dr- p /6W° H kj , 

55 dr- p /60 H j, and dr- p /de° k are calculated by utilizing the values r- p stored in the section 31 in the control 
parameter adjusting section 63 and are stored in the memory of the section 32 in a step 408. 

In cases where the teaching signal data a k p y k p is judged to be the boundary specifying data in the 
step 404, the value of the boundary term r> p is calculated in the boundary term calculation section 62 by 
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utilizing the N items of boundary specifying data according to the equation (69) and the calculated value is 
stored in a prescribed memory (not shown) of the section 62 in a step 409. Thereafter, the control 
parameters W^. W° H kj , o H j, and 6° k utilized in the first and second connection sections 27, 28 are 
transmitted to the control parameter adjusting section 63 so that partial derivatives d^P/dW^'ji.d^P/dW 0 "^, 
5 dr> p /de H j, and dr . p /ae° k are calculated by utilizing the values r> p stored in the section 62 in the control 
parameter adjusting section 63 and are stored in the memory of the section 63 in a step 410. 

Thereafter, whether or not the parameter p equals the number Q of items of training example data is 
judged in the prescribed control section in a step 411. In cases where the parameter p is judged not to 
equal the number Q, the parameter p is incremented by 1 in a step 412 so that the procedures from the 

io step 402 to the step 411 are repeated, until the terms r* p , r- p r> p and partial derivatives 

dr^/dW+V dr. p -W°% dr,»A0 H j, 6r^/b0° kt ar-O/aVv+V 6r- p /aW° H kj , br-o/de^, ar- p /a<?° k , 
dr> p /d\A^' jM 6r p ar> p /ae H j, and ar> p /d0° k are stored throughout all the sets of training example data in 

the memory of the sections 61.63. 

In cases whore the parameter p is judged to equal the number Q in the step 411, the judgement means 

75 that all sets of tram.ng example data have been calculated to provide Q sets of output data for the section 
61. Moreover, the judgement means that the attraction term, the repulsion term, and the boundary term r + p , 
r- p , r> p and the partial derivatives dr^/aW^, dr^/avV 0 ^, dr + p tee H j, 6r + p /de° k , dr-PfeW^j,, dr- p /aW° H k ! 
ar- p /ae H j, ar-p/a*° k ar, p aw"',,. dr> p /aw° H kj , ar> p /ae H j, and ar> p /&e° k are stored throughout all the sets of 
training example data in the memory of the sections 61, 63. Therefore, the renewed values AW^t), 

20 AW° H kj (t) A0 H j(t). and A0 c k (t) are calculated by applying the equations (72) to (75), (10), (12), (14), and (16) 
so that the weight parameters W^t), W° H kj (t) and the threshold values 6 H j(t). 0° k (t) are slightly renewed by 
applying the equations (17) to (20) in a step 413. That is, the renewed weight parameters ^,(1 + 1), W° H kj - 
(t + 1) and the renewed threshold values 0 H j(t+1). 0° k (t+1) are determined. The value of the loss function r 
is then found by utilizing the terms r+ p , r- p , and r> p stored in the memory of the section 61 according to the 

25 equation (71) in a step 414. In cases where the calculation in the steps 213, 214 isimplemented, for 
example, for the first time, the iteration number t equals 1. 

Thereafter, whether or not the value of the loss function r satisfies the finishing condition is judged by 
applying the equation (21) in a step 415. That is, in cases where the value of the loss function r is equal to 
or greaterthan E L , the finishing condition is not satisfied so that the iteration number t is incremented by 1. 

30 Thereafter, the procedures from the step 401 to the step 415 are iterated while utilizing the renewed control 
parameters W^tf + l), W° H kj (t + 1), e H j(t+1), and 0° k (t+1) in place of the control parameters W^ft) W°V 

. (t), * H j(t), and 0° k (t). Jw kj 

The iteration of the procedures between the step 401 and the step 415 is continued until the finishing 
condition is satisfied. 

35 In cases where the value of the loss function r is less than E L , the finishing condition is satisfied in the 
step 415 so that the learning in the learning apparatus is completed. That is, the control parameters such as 
the weight parameters and the threshold values are adjusted to the most appropriate values because the 
steepest-descent method is utilized. 

Accordingly, after the weight parameters and the threshold values are adjusted to the most appropriate 

40 values, the operator can accomplish the most appropriate diagnosis, the most appropriate robot control, or 
the like without utilizing the section 63. 

In the first embodiment of the second modification, the loss function occasionally does not reach the 
finishing conditions because the neural network is a non-linear type. In this case, it is preferable that a 
maximum iteration number t^iAx be added to the finishing condition. That is, the learning for adjusting 

45 control parameters such as the weight parameters and the threshold values is stopped when the number of 
iterations reaches the maximum iteration number Wx- Thereafter, initial values of the control parameters 
are reset by the operator before the learning is resumed. 

Moreover, either the desired output data, the undesirable output data, or the boundary specifying data 
is provided to the learning apparatus as a set of training example data in the first embodiment of the 

50 second modification. However, it is preferable that the desired output data, the undesirable output, and the 
boundary specifying data be intermingled in the same set of training example data. In this case, the output 
vector corresponding to the boundary specifying data is defined in prescribed dimensions of less than the 
N dimensions. 

Next, learning methods according to other embodiments of the second modifications are described with 
55 reference to Figs. 20 and 21 . 

Fig. 20 is a graphic view showing a boundary term r> p of the loss function according to a second 
embodiment of the second modification. 

The boundary term r> p shown in Fig. 20 is defined by utilizing the following equation. 
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r> p = 7 >*exp(-d p ) (76) 

The features of the boundary term r> p defined by the equation (76) are as follows. 
5 A partial derivative of the boundary term r> p with respect to the inner product d p diverges when the 

output vector is positioned deep within the undesired region (d p = -»). Moreover, the partial derivative 
dr> p /6d p gradually approaches the zero value when the output vector is positioned deep within the desired 
region (d p = +«>). 

Therefore, the output vector is strongly shifted to the desired region in early iterative calculations 
w because the absolute value of the partial derivative 6r> p /6d p is large in the undesired region. Thereafter, the 
output vector is not shifted in practice because the absolute value of the partial derivative dr> p /dd p is small 
in the desired region. In other words, the learning is stabilized. 

In addition, the shape of the boundary term r> p is simple, and the value of the boundary term r> p is 
smoothly shifted. Therefore, the boundary term r> p can be easily handled. 
15 Moreover, because the value of the boundary term r> p is exponentially shifted, the value is not shifted in 
practice when the output vector is positioned deep within the desired region. In other words, the influence of 
the boundary specifying data on the output vector is largely reduced within a deep desired region. 

On the other hand, because the value of the boundary term r> p is large within a deep undesired region, 
the output vector positioned within the deep undesired region is strongly shifted to the desired region. 
20 Fig. 21 is a graphic view showing a boundary term r> p according to a third embodiment of the second 
modification. 

The boundary term r> p shown in Fig. 21 is defined by utilizing the following equation. 
r> p = -y>.d p *{sgn(d p ) - 1 } (77) 

25 

where sgn(d p ) is a sign function. That is, 

sgn(d> ) a><0 

The features of the boundary term r> p defined by the equation (77) are as follows. 
A partial derivative of the boundary term r> p with respect to the inner product d p equals a constant value 
35 when the output vector is positioned within the undesired region. Moreover, the partial derivative 6r> p /dd p 
equals the zero value when the output vector is positioned within the desired region. 

Therefore, the influence of the boundary specifying data on the output vector positioned within the 
undesired region is constant so that the output vector is gradually shifted to the desired region. Moreover, 
the output vector positioned within the desired region is not shifted regardless of whether the calculations 
40 are iterated. In other words, the boundary specifying data exerts no influence on the output vector 
positioned within the desired region. 

Accordingly, because the partial derivative of the boundary term or> p /dd p is constant within the 
undesired region, the learning is stabilized, although the speed of the learning is inferior. 

In addition, because the shape of the boundary term r> p is simple, it is easy to analytically handle the 
45 boundary term r> p . 

As mentioned above, the learning is implemented to shift the output vector to the desired region which 
is designated by the boundary specifying data by utilizing the learning method and the learning apparatus 
according to the second modification. Therefore, the learning performance can be improved. That is, the 
learning can be accomplished according to learning conditions, 
so Moreover, the operator can freely utilize the desired output data, the undesirable output, and the 
boundary specifying data according to the objects of the learning. 

Having illustrated and described the principles of our invention in a preferred embodiment thereof, it 
should be readily apparent to those skilled in the art that the invention can be modified in arrangement and 
detail without departing from such principles. We claim all modifications coming within the spirit and scope 
55 of the accompanying claims. 

Reference signs in the claims are intended for better understanding and shall not limit the scope. 

Claims 
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1. A learning method for adjusting control parameters 9 of a learning apparatus in which input data is 
converted into output data by utilizing the control parameters 6 and teaching signal data, comprising 
steps of: 

defining undesirable output data which is not acceptable as output data, the undesirable output 
data being a type of the teaching signal data; 

defining a loss function r of which a value decreases along with the increase of a difference x 
between the undesirable output data and the output data, the loss function r being designated by an 
equation r = exp(-x 2 ); 

providing both the input data and the undesirable output data to the learning apparatus; 

calculating a value of the output data by converting a value of the input data by utilizing values of 
the control parameters 0; 

calculating a value of the loss function r by utilizing both the value of the output data and a value of 
the undesirable output data; 

iteratively calculating the value of the loss function r by renewing the values of the control 
parameters 9 to decrease the value of the loss function r until the value of the loss function r is less 
than a prescribed value in cases where the value of the loss function r is equal to or greater than the 
prescribed value; and 

adjusting the control parameters 0 of the learning apparatus to newest values of the control 
parameters 9 when the value of the loss function is less than the prescribed value. 

2. A learning method according to claim 1 in which a renewed value of the control parameter 0 in the step 
of iteratively calculating the value of the loss function r are designated by an equation 

A0(t+1) = -(1-a)*^dr/69(t) + a*A0(t), 

where A0(t+ 1) is a renewed value of the control parameter 0 in the t-th iterative calculation, dr/69(t) is 
a partial derivative of the loss function r with respect to the control parameter 0 in the t-th iterative 
calculation, a is a momentum parameter for adjusting the contribution of A9(t) to A0(t + 1), v is a 
learning rate for adjusting the contribution of dr/d0(t) to A0(t + 1). 

3. A learning method for adjusting control parameters 0 of a learning apparatus in which input data is 
converted into output data by utilizing the control parameters 0 and teaching signal data, comprising 
steps of: 

defining undesirable output data which is not acceptable as output data, the undesirable output 
data being a type of the teaching signal data; 

defining desired output data which is desired as output data, the desired output data being another 
type of the teaching signal data; 

defining an attraction term r+ of which a value decreases along with the decrease of a difference x + 
between the desired output data and the output data, the attraction term r* being designated by an 
equation r + = 1/2*x+ 2 ; 

defining a repulsion term r- of which a value decreases along with the increase of a difference x- 
between the undesirable output data and the output data by utilizing a parameter 7 - for adjusting a 
contributory ratio of the undesirable output data to the desired output data, the repulsion term r- being 
designated by an equation r- = 7 — exp(-x- 2 ); 

defining a loss function r found by adding the repulsion term r- and the attraction term r* together; . 

providing both the input data and the teaching signal data to the learning apparatus; 

calculating a value of the output data by converting a value of the input data by utilizing values of 
the control parameters 0; 

calculating a value of the loss function r by utilizing both the value of the output data and a value of 
the teaching signal data; 

iteratively calculating the value of the loss function r by renewing the values of the control 
parameters 9 to decrease the value of the loss function until the value of the loss function is less than a 
prescribed value in cases where the value of the loss function r is equal to or greater than the 
prescribed value; and 

adjusting the control parameters 0 of the learning apparatus to newest values of the control 
parameters 0 when the value of the loss function r is less than the prescribed value. 
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4. A learning method for adjusting control parameters 6 such as weight parameters and threshold values 
of a learning apparatus by utilizing teaching signal data and a plurality of neurons interconnected in an 
artificial neural network in which input data provided to first neurons (22) of a first stage (21) is weighted 
with the weight parameters and the threshold values are subtracted from the weighted input data so 
that the weighted input data is transmitted to final neurons (26) of a final stage (25) through hidden 
stages (24) in which the data is weighted with the weight parameters, subtracted the threshold values, 
and converted by applying a prescribed monotone increasing function, after which output data is 
provided from the final neurons (26), comprising steps of: 

defining undesirable output data which is not acceptable as output data, the undesirable output 
data being a type of the teaching signal data; 

defining a loss function r which decreases along with the increase of the difference between the 
undesirable output data and the output data; 

providing the input data to the first neurons (22); 

providing the undesirable output data to the learning apparatus; 

calculating a value of the output data obtained by converting the input data while utilizing the 
control parameters and the monotone increasing function; 

calculating a value of the loss function r by utilizing both the value of the output data and a value of 
the undesirable output data; 

iteratively calculating the value of the loss function r by renewing the control parameters to 
decrease the value of the loss function r until the value of the loss function r is less than a prescribed 
value in cases where the value of the loss function r is equal to or greater than the prescribed value; 
and 

adjusting the control parameters of the learning apparatus to the newest control parameters when 
the value of the loss function r is less than the prescribed value. 

5. A learning method according to claim 4 in which the loss function r in the step of defining a loss 
function r is designated by an equation r= exp(-x 2 ), where x is the difference between the undesirable 
output data and the output data. 

6. A learning method according to claim 4 in which a renewed value of the control parameter e in the step 
of iteratively calculating the value of the loss function r are designated by an equation 

A6(t + 1) = -(1-a).^dr/d9(t) + a*A9(t), 

where A9(t + 1) is a renewed value of the control parameter 9 in the t-th iterative calculation, dr/69(t) is 
a partial derivative of the loss function r with respect to the control parameter 9 in the t-th iterative 
calculation, a is a momentum parameter for adjusting the contribution of A9(t) to A9(t+1), v is a 
learning rate for adjusting the contribution of 6r/d9(t) to A9(t + 1). 

7. A learning method for adjusting both weight parameters and threshold values of a learning apparatus by 
utilizing teaching signal data and a plurality of neurons interconnected in an artificial neural network in 
which input data provided to first neurons (22) of a first stage (21) is weighted with the weight 
parameters and the threshold values are subtracted from the weighted input data so that the weighted 
input data is transmitted to final neurons (26) of a final stage (25) through hidden stages (24) in which 
the data is weighted with the weight parameters, subtracted the threshold values, and converted by 
applying a prescribed monotone increasing function, after which output data is provided from the final 
neurons (26), comprising steps of: 

defining undesirable output data which is not acceptable as output data, the undesirable output 
data being a type of the teaching signal data; 

defining desired output data which is desired as output data, the desired output data being another 
type of the teaching signal data; 

defining a repulsion term r- of which a value decreases along with the increase of a difference x- 
between the undesirable output data and the output data; 

defining an attraction term r+ of which a value decreases along with the decrease of a difference x+ 
between the desired output data and the output data; 

defining a loss function r found by adding the repulsion term r- and the attraction term r+ together; 

providing the input data to the first neurons (22); 

providing the teaching signal data to the learning apparatus; 
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calculating a value of the output data obtained by converting the input data while utilizing the 
weight parameters, the threshold values, and the monotone increasing function; 

calculating a value of the loss function r by utilizing both the value of the output data and a value of 
the teaching signal data; 

iteratively calculating the value of the loss function r by renewing the weight parameters and the 
threshold values to decrease the value of the ioss function r until the value of the loss function r is less 
than a prescribed value in cases where the value of the loss function r is equal to or greater than the 
prescribed value; and 

adjusting the weight parameters and the threshold values of the learning apparatus to the newest 
weight parameters and the newest threshold values when the value of the loss function r is less than 
the prescribed value. 

a A learning method according to claim 7 in which the repulsion term r- in the step of defining a 
repulsion term r- is designated by an equation r- = T -*exp(-x- 2 ) by utilizing a parameter 7 - for 
adjusting a contributory ratio of the undesirable output data to the desired output data, and 
the attracted term r + in the step of defining an attraction term r* is designated by an equation r t = 
1/2-X+ 2 . 

9. A learning method for adjusting control parameters 8 of a learning apparatus in which input data is 
converted into output data by utilizing the control parameters and teaching signal data, comprising 
steps of: 

defining undesirable output data which is not acceptable as output data and is a type of the 
teaching signal data; 

defining desired output data which is desired as output data and is another type of the teaching 
signal data; 

defining a distribution function G(x|9) relating to the desired output data by utilizing both the control 
parameters 9 and a difference x between the output data and the desired output data: 

defining a distribution function 1 - G(x|9) relating to the undesirable output data by utilizing both the 
control parameters 6 and a difference x between the output data and the undesirable output data; 

substituting the distribution functions G(x|9), 1 - G(x|9) relating to the desired and undesirable 
output data for a logarithmic likelihood l E (x) of which a value is maximized when the difference between 
the output data and the desired output data is decreased and the difference between the output data 
and the undesirable output data is increased, the substituted logarithmic likelihood l E (x) being repre- 
sented by a following equation 



1e(x) * xin G(x|e) ♦ ZlnU - G(x|©)} 
P • p :U 

where E p:D indicates that the adding calculations are implemented when the desired output data is 
provided to the learning apparatus, and E p:U indicates that the adding calculations are implemented 
when the undesirable output data is provided to the learning apparatus; 
defining a loss function r = -l E (x); 

providing both the input data and the teaching signal data to the learning apparatus; 
calculating a value of the output data by converting a value of the input data by utilizing values of 
the control parameters; 

calculating a value of the loss function r; 

iteratively calculating the value of the loss function r by renewing the values of both the control 
parameters 9 to decrease the value of the loss function r until the value of the loss function r is less 
than a prescribed value in cases where the value of the loss function r is equal to or greater than the 
prescribed value; and 

adjusting the control parameters 6 of the learning apparatus to newest values of the control 
parameters 9 when the value of the loss function r is less than the prescribed value. 

10. A learning method for adjusting control parameters 9such as weight parameters and threshold values of 
a learning apparatus by utilizing teaching signal data and a plurality of neurons interconnected in an 
artificial neural network in which input data provided to first neurons (22) of a first stage (21) is weighted 
with the weight parameters and the threshold values are subtracted from the weighted input data so 
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that the weighted input data is transmitted to final neurons (26) of a final stage (25) through hidden 
stages (24) in which the data is weighted with the weight parameters, subtracted the threshold values, 
and converted by applying a prescribed monotone increasing function, after which output data is 
provided from the final neurons (26), comprising steps of: 
5 defining undesirable output data which is not acceptable as output data and is a type of the 

teaching signal data; 

defining desired output data which is desired as output data and is another type of the teaching 
signal data; 

defining a distribution function G(x|9) relating to the desired output data by utilizing both the control 
10 parameters 9 and a difference x between the output data and the desired output data: 

defining a distribution function 1 - G(x|9) relating to the undesirable output data by utilizing both the 
control parameters 9 and a difference x between the output data and the undesirable output data; 

substituting the distribution functions G(x|9), 1 - G(x|9) relating to the desired and undesirable 
output data for a logarithmic likelihood l E (x) of which a value is maximized when the difference between 
15 the output data and the desired output data is decreased and the difference between the output data 

and the undesirable output data is increased, the substituted logarithmic likelihood l E (x) being repre- 
sented by a following equation 

20 1e(x) = 21n G(xf©) ♦ SlnU - G(x|e)} 

»:D p:U 

where r p:D indicates that the adding calculations are implemented when the desired output data is 
provided to the learning apparatus, and r p:U indicates that the adding calculations are implemented 
25 when the undesirable output data is provided to the learning apparatus; 

defining a loss function r = l E (x); 

providing the input data to the first neurons (22); 

providing the teaching signal data to the learning apparatus; 

calculating a value of the output data obtained by converting the input data while utilizing the 
30 control parameters; 

calculating a value of the loss function r; 

iteratively calculating the value of the loss function r by renewing the values of the control 
parameters 9 to decrease the value of the loss function r until the value of the loss function r is less 
than a prescribed value in cases where the value of the loss function r is equal to or greater than the 
35 prescribed value; and 

adjusting the control parameters 9 of the learning apparatus to newest values of the control 
parameters 9 when the value of the loss function r is less than the prescribed value. 

11. A learning method for adjusting control parameters of a learning apparatus in which input data is 
40 converted into output data by utilizing the control parameters and teaching signal data, comprising 
steps of: 

defining undesirable output data which is not acceptable as output data and is a type of the 
teaching signal data; 

defining desired output data which is desired as output data and is another type of the teaching 
45 signal data, the desired output data occupying a region s; 

defining a distribution function s*g(x+) relating to the desired output data by utilizing both the region 
s occupied by the desired output data and a normal distribution g(x+) which is defined by both a 
variance a 2 and a difference x+ between the output data and the desired output data, the distribution 
function s*g(x+) being defined by an equation s*g(x*) = s-[1/(2^) 1/2 o] . exp[-x+ 2 /(2o 2 )]; 
so defining a distribution function 1 - s-g(x-) relating to the undesirable output data by utilizing both 

the region s occupied by the desired output data and a normal distribution g(x-) which is defined by 
both the variance o 2 and a difference x- between the output data and the undesirable output data, the 
distribution function 1 - s.g(x-) being defined by an equation 

55 1 - s-g(x-) = 1 - s*[1/(2*) 1 ' 2 o] . exp[-x- 2 /(2a 2 )]; 

providing both the input data and the teaching signal data to the learning apparatus; 

calculating a value of the output data by converting a value of the input data by utilizing values of 
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the control parameters; 

deriving an attraction term r + = In (a) + x* 2 /(2a 2 ) and a repulsion term r- = s/[(2ir) 1/2 o] . exp[-x- 2 /- 
(2a 2 )] from a logarithmic-likelihood l E (x) based on a statistical entropy as follows, 

-l E (x) = -In g(x+) + s.g(x-) 
r + = -In g(x*) 
r- = s.g(x-); 

calculating a value of the attraction term r+ in cases where the desired output data is provided to 
the learning apparatus as the teaching signal data, a value of the attraction term r+ being decreased 
along with the decrease of the difference x+; 

calculating a value of the repulsion term r- in cases where the undesirable output data is provided 
to the learning apparatus as the teaching signal data, a value of the repulsion term r- being decreased 
along with the increase of the difference x-; 

calculating a loss function r found by adding the repulsion term r- and the attraction term r + 
together; 

iteratively calculating the value of the loss function r by renewing the values of both the control 
parameters and the variance a 2 to decrease the value of the loss function r until the value of the loss 
function r is less than a prescribed value in cases where the value of the loss function is equal to or 
greater than the prescribed value; and 

adjusting both the control parameters and the variance a 2 of the learning apparatus to newest 
values of both the control parameters and the variance a 2 when the value of the loss function r is less 
than the prescribed value. 

12. A learning method according to claim 11 in which renewed values of both the control parameter 9 and 
the variance a 2 in the step of iteratively calculating the value of the loss function r are designated by an 
equation 

A9(t + 1) = -(1-a)*ij.dr/de(t) + a.A9(t) 
Aa(t + 1) = -(1-a).ij*dr/do(t) + a*Aa(t), 

where A9(t+1) and Aa(t + 1) are renewed values of the control parameters 9, a in the t-th iterative 
calculation, dr/69(t) and dr/da(t) are partial derivatives of the loss function r with respect to the control 
parameters 9, a in the t-th iterative calculation, a is a momentum parameter for adjusting the 
contribution of A9(t) and Ac(t) to A9(t+1) and Ao(t + 1), t, is a learning rate for adjusting the 
contribution of dr/d9(t) and dr/da(t) to A9(t + 1 ) and Ao(t + 1 ). 

1a A learning method for adjusting control parameters 9 such as weight parameters and threshold values 
of a learning apparatus by utilizing teaching signal data and a plurality of neurons interconnected in an 
artificial neural network in which input data provided to first neurons (22) of a first stage (21) is weighted 
with the weight parameters and the threshold values are subtracted from the weighted input data so 
that the weighted input data is transmitted to final neurons (26) of a final stage (25) through hidden 
stages (24) in which the data is weighted with the weight parameters, subtracted the threshold values, 
and converted by applying a prescribed monotone increasing function, after which output data is 
provided from the final neurons (26), comprising steps of: 

defining undesirable output data which is not acceptable as output data and is a type of the 
teaching signal data; 

defining desired output data which is desired as output data and is another type of the teaching 
signal data, the desired output data occupying a region s; 

defining a distribution function s.g(x + ) relating to the desired output data by utilizing both the region 
s occupied by the desired output data and a normal distribution g(x + ) which is defined by both a 
variance a 2 a difference x+ between the output data and the desired output data, the distribution 
function s*g(x+) being defined by an equation 

s-g(x+) = s.[1/(2tt) 1/2 o] . exp[-x+ 2 /(2o 2 )]; 

defining a distribution function 1 - s*g(x-) relating to the undesirable output data by utilizing both 
the region s occupied by the desired output data and a normal distribution g(x-) which is defined by 
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both the variance a 2 and a difference x- between the output data and the undesirable output data, the 
distribution function 1 - s*g(x-) being defined by an equation 

1 - s-g(x-) = 1 - s*[1/(2^) 1/2 a] * exp[-x- 2 /(2o 2 )]; 

providing the input data to the first neurons (22); 

providing the teaching signal data to the learning apparatus; 

calculating a value of the output data obtained by converting the input data while utilizing the 
control parameters; 

deriving an attraction term r+ = In (a) + x* 2 /(2a 2 ) and a repulsion term r- = s/[(2ir) 1/2 a] * exp[-x- 2 /- 
(2a 2 )] from a logarithmic-likelihood l E (x) based on a statistical entropy as follows, 

-l E (x) = -In g(x+) + s-g(x-) 

= -In g(x+) 
r- = s-g(x-); 

calculating a value of the attraction term r* in cases where the desired output data is provided to 
the learning apparatus as the teaching signal data;, a value of the attraction term r+ being decreased 
along with the decrease of the difference x+; 

calculating a value of the repulsion term r- in cases where the undesirable output data is provided 
to the learning apparatus as the teaching signal data, a value of the repulsion term r- being decreased 
along with the increase of the difference x-; 

calculating a loss function r found by adding the repulsion term r- and the attraction term r+ 
together; 

iteratively calculating the value of the loss function r by renewing the values of the control 
parameters and the variance a 2 to decrease the value of the loss function r until the value of the loss 
function r is less than a prescribed value in cases where the value of the loss function r is equal to or 
greater than the prescribed value; and 

adjusting the control parameters and the variance a 2 of the learning apparatus to newest values of 
the control parameters and the variance a 2 when the value of the loss function r is less than the 
prescribed value. 

14. A learning method according to claim 13 in which renewed values of both the control parameter 6 and 
the variance a 2 in the step of iteratively calculating the value of the loss function r are designated by an 
equation 

A9(t + 1) = -(1-a)^r/de(t) + a*A9(t) 
Aa(t + 1) = -(1-a)*i>*dr/do(t) + a*Aa(t), 

where A9(t+1) and Aa(t + 1) are renewed values of the control parameters 9, o in the t-th iterative 
calculation, dr/d9(t) and dr/da(t) are partial derivatives of the loss function r with respect to the control 
parameters 9, o in the Hh iterative calculation, a is a momentum parameter for adjusting the 
contribution of A9(t) and Aa(t) to A9(t + 1) and Ao(t + 1), v is a learning rate for adjusting the 
contribution of dr/d9(t) and 6r/6o(t) to A9(t + 1) and Ao(t + 1). 

15. A learning method for adjusting control parameters 9 such as weight parameters and threshold values 
of a learning apparatus by utilizing teaching signal data and a plurality of neurons interconnected in an 
artificial neural network in which input data provided to first neurons (22) of a first stage (21) is weighted 
with the weight parameters and the threshold values are subtracted from the weighted input data so 
that the weighted input data is transmitted to final neurons (26) of a final stage (25) through hidden 
stages (24) in which the data is weighted with the weight parameters, subtracted the threshold values, 
and converted by applying a prescribed monotone increasing function, after which output data is 
provided from the final neurons (26), comprising steps of: 

defining undesirable output data which is not acceptable as output data and is a type of the 
teaching signal data; 

defining desired output data which is desired as output data and is another type of the teaching 
signal data, the desired output data occupying a region s; 

defining a distribution function s*g(x+) relating to the desired output data by utilizing both a region s 
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occupied by the desired output data and a normal distribution g(x*) which is defined by both a 
difference x* between the output data and the desired output data and a variance a 2 , the distribution 
function s*g(x+) being defined by an equation 

5 s*g(x + ) = s.[1/(27r) 1/2 o] . exp[-x + 2 /(2o 2 )]; 

defining a distribution function 1 - s-g(x-) relating to the undesirable output data by utilizing both 
the region s occupied by the desired output data and a normal distribution g(x-) which is defined by 
both a difference x-between the output data and the undesirable output data and the variance o 2 , the 
io distribution function 1 - s*g(x-) being defined by an equation 

1 -s-g(x-) = 1 - s.[1/(27r) 1/2 a]. exp[-x- 2 /(2a 2 )]; 

providing the input data to the first neurons (22); 
'5 providing the teaching signal data to the learning apparatus; 

calculating a value of the output data obtained by converting the input data while utilizing the 
control parameters; . 

deriving an attraction term r- = In (a) + x* 2 /(2o 2 ) and a repulsion term r- = s/[(2w) 1/2 o] . exp[-x- 2 /- 
(2c 2 )] from a logarithmic-likelihood l E (x) based on a statistical entropy as follows, 

20 

-l E (x) = -In g(x-) + s.g(x-) 
r+ = -In g(x*) 
r- = s*g(x-); 

25 calculating a value of the attraction term r+ in cases where the desired output data is provided to 

the learning apparatus as the teaching signal data, a value of the attraction term r+ being decreased 
along with the decrease of the difference x+; 

calculating a value of the repulsion term r- in cases where the undesirable output data is provided 
to the learning apparatus as the teaching signal data, a value of the repulsion term r- being decreased 
30 along with the increase of the difference x-; 

calculating a loss function r found by adding the repulsion term r- and the attraction term r + 
together; 

iteratively calculating the value of the loss function r by renewing the values of the control 
parameters and the variance a 2 to decrease the value of the loss function r until the value of the loss 
35 function r is less than a first small value; 

resetting the distribution functions s*g(x+), 1-s*g(x+) relating to the desired and undesirable output 
data to a more proper distribution functions to practical distributions of the distances x+, x- which are 
practically obtained by iteratively calculating the value of the loss function r when the value of the loss 
function r is less than the first small value; 
40 iteratively calculating the value of the loss function r by renewing the values of the control 

parameters and the variance o 2 according to the practical distributions to decrease the value of the loss 
function r until the value of the loss function r is less than a second small value; and 

adjusting the control parameters and the variance a 2 of the learning apparatus to newest values of 
the control parameters and the variance a 2 when the value of the loss function r is less than the second 
45 small value. 

16. A learning method for adjusting control parameters e such as weight parameters and threshold values 
of a learning apparatus by utilizing teaching signal data and a plurality of neurons interconnected in an 
artificial neural network in which input data provided to first neurons (22) of a first stage (21 ) is weighted 

so with the weight parameters and the threshold values are subtracted from the weighted input data so 
that the weighted input data is transmitted to final neurons (26) of a final stage (25) through hidden 
stages (24) in which the data is weighted with the weight parameters, subtracted the threshold values, 
and converted by applying a prescribed monotone increasing function, after which output data is 
provided from the final neurons (26), comprising steps of: 

55 dividing the final neurons (26) of the final stage (25) between neurons A and neurons B; 

defining two items of undesirable output data A, B which are not acceptable as output data and is a 
type of the teaching signal data, the undesirable output data A corresponding to the neurons A and the 
undesirable output data B corresponding to the neurons B; 
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defining two items of desired output data A, B which are desired as output data and is another type 
of the teaching signal data, the desired output data A corresponding to the neurons A and the desired 
output data B corresponding to the neurons B; 

defining a distribution function s A *g A (x A ) by utilizing both a region s A occupied by the desired 
5 output data A provided from the neuron A and a normal distribution g A (x A ) which is defined by both a 

difference x A between the output data A and the desired output data A and a variance a A 2 , the 
distribution function s A *g A (x A ) being defined by an equation s A *g A (x A ) = s a *I1/(2tt) 1/2 o a ] * exp[-x A 2 /- 
<2o A 2 )]; 

defining a distribution function s B *g B (x B ) by utilizing both a region s B occupied by the desired 
w output data B provided from the neuron B and a normal distribution g B (x B ) which is defined by both a 

difference xb between the output data B and the desired output data B and a variance o B 2 , the 
distribution function s B -g B (x B ) being defined by an equation s B -g B (x B ) = SB*[1/(2w) 1i2 o B ] • exp[-x B 2 /(2o B 2 )- 

]; 

defining a distribution function 1 - s A *g A (x A ) relating to the undesirable output data A; 
75 defining a distribution function 1 - s B -g B (xB) relating to the undesirable output data B; 

providing the input data to the first neurons (22); 
providing the teaching signal data to the learning apparatus; 

calculating a value of the output data obtained by converting the input data while utilizing the 
control parameters; 

20 deriving attraction terms r A+ = In (a A ) + x A + 2 /(2a A 2 ), r B+ = In (a B ) + x B + 2 /(2o B 2 ) and repulsion 

terms r A . = s A /[(27r) 1/2 a A ] • exp[-x A 2 /(2a A 2 )], r B . = Sb/[<2») 1/2 0 B ] . exp[-xe. 2 /(2a B 2 )] from a logarithmic- 
likelihood l E (x) based on a statistical entropy as follows, 

-l E (x) = -In g A (x A+ ) + s A .g A (x A .) -In g B <x B+ ) 
25 '+ s B *g B (xs.) 

r A+ = -In g A (x A+ ) 
r A . = s.g A (x A .) 
r B+ = -In g B (xa + ) 
re- = s-g B (xs.); 

30 

calculating a value of the attraction term r A+ , or r B+ in cases where the desired output data A, or B 
is provided to the learning apparatus as the teaching signal data, a value of the attraction term r A+ , or 
r B + being decreased along with the decrease of the difference x A + , or X6 + ; 

calculating a value of the repulsion term r A ., or r^ in cases where the undesirable output data A, or 
35 B is provided to the learning apparatus as the teaching signal data, a value of the repulsion term r A ., or 
tb. being decreased along with the increase of the difference x A ., or x^.; 

calculating a loss function r = r A+ + r B + + r A . + rs.; 

iteratively calculating the value of the loss function r by renewing the values of the control 
parameters and the variance o A 2 t a B 2 to decrease the value of the loss function r until the value of the 
40 loss function r is less than a prescribed value in cases where the value of the loss function r is equal to 
or greater than the prescribed value; and 

adjusting the control parameters and the variance o A 2 ,o B 2 of the learning apparatus to newest 
values of the control parameters and the variance o A 2 , a B 2 when the value of the loss function r is less 
than the prescribed value. 

45 

17. A learning method according to claim 16 in which renewed values of both the control parameter 9 and 
the variance o in the step of iteratively calculating the value of the loss function r are designated by an 
equation 

50 A9(t + 1) = -(1-a).7,*ar/d6(t) + a*A9(t) 

Aa(t + 1) = -(1-a).7/*dr/d9(t) + a*Aa(t), 

where A9(t + 1) and Aa(t + 1) are renewed values of the control parameters 9, o in the t-th iterative 
calculation, 6r/d9(t) and dr/do(t) are partial derivatives of the loss function r with respect to the control 
55 parameters 9, a in the Hh iterative calculation, a is a momentum parameter for adjusting the 

contribution of A9(t) and Ao(t) to A9(t+1) and Ao(t + 1), »i is a learning rate for adjusting the 
contribution of 6r/d9(t) and dr/do(t) to A9(t+ 1) and Ao(t + 1). 
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1a A learning method for adjusting control parameters of a learning apparatus in which input data is 
converted into output data by utilizing the control parameters and teaching signal data, comprising 
steps of: 

defining undesirable output data which is not acceptable as output data and is a type of the 
teaching signal data; 

defining desired output data which is desired as output data and is another type of teaching signal 
data, the desired output data occupying a region s; 

defining a distribution function s-Ga(x + ) of the desired output data by utilizing both the region s 
occupied by the desired output data and a generalized Gaussian distribution Ga(x) which is defined by 
a parameter o, a Gamma function r, a parameter b, and a difference x* between the output data and 
the desired output data, the distribution function s-Ga(x+) being defined by an equation 

s*Ga(xO = s.[{(2.b 1/b -o.r<1/b + l)r 1 -exp{+cj b /(b-a b )}]; 

defining a distribution function 1 - s-Ga(x-) of the undesirable output data by utilizing both the 
region s and the generalized Gaussian distribution Ga(x) which is defined by the parameter a, the 
Gamma function r, the parameter b, and a difference x- between the output data and the undesirable 
output data, the distribution function 1 - s-Ga(x-) being defined by an equation 

20 1 - s-Ga(x-) = 1 - s.[{(2*b 1/b *o*r(1/b +1)}-' .exp{-jx-| b /(b.c b )}]; 

providing both the input data and the teaching signal data to the learning apparatus; 
calculating a value of the output data by converting a value of the input data by utilizing values of 
the control parameters; 

25 deriving an attraction term r + = In (a) + 1/b*ln b + In r(1/b + 1) + x + b /(b*a b ) and a repulsion term 

r- = s/[{(2.b 1/b .o.r(1/b +1)}- 1 -exp{-jx-| b /(b*a b )}] 

from a logarithmic-likelihood l E (x) based on a statistical entropy as follows, 

30 

-l E (x) = -In Ga(x + ) + s*Ga(x-) 
>♦ = -In Ga(x+) 
r- = s-Ga(x-); 

35 calculating a value of the attraction term r + in cases where the desired output data is provided to 

the learning apparatus as the teaching signal data, a value of the attraction term r+ being decreased 
along with the decrease of the difference x+; 

calculating a value of the repulsion term r- in cases where the undesirable output data is provided 
to the learning apparatus as the teaching signal data, a value of the repulsion term r- being decreased 
40 along with the increase of the difference x-; 

calculating a loss function r found by adding the repulsion term r- and the attraction term r+ 
together; 

iteratively calculating the value of the loss function r by renewing the values of both the control 
parameters and the parameter o to decrease the value of the loss function r until the value of the loss 
45 function r is less than a prescribed value in cases where the value of the loss function is equal to or 
greater than the prescribed value; and 

adjusting both the control parameters and the parameter o of the learning apparatus to newest 
values of both the control parameters and the parameter o when the value of the loss function r is less 
than the prescribed value. 

50 

19. A learning method for adjusting control parameters esuch as weight parameters and threshold values of 
a learning apparatus by utilizing teaching signal data and a plurality of neurons interconnected in an 
artificial neural network in which input data provided to first neurons (22) of a first stage (21) is weighted 
with the weight parameters and the threshold values are subtracted from the weighted input data so 
55 that the weighted input data is transmitted to final neurons (26) of a final stage (25) through hidden 

stages (24) in which the data is weighted with the weight parameters, subtracted the threshold values, 
and converted by applying a prescribed monotone increasing function, after which output data is 
provided from the final neurons (26), comprising steps of: 

41 
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defining undesirable output data which is not acceptable as output data and is a type of the 
teaching signal data; 

defining desired output data which is desired as output data and is another type of the teaching 
signal data, the desired output data occupying a region s; 
5 defining a distribution function s*Ga(x+) of the desired output data by utilizing both the region s 

occupied by the desired output data and a generalized Gaussian distribution Ga(x) which is defined by 
a parameter a, a Gamma function r, a parameter b, and a difference x+ between the output data and 
the desired output data, the distribution function s*Ga(x+) being defined by an equation 

w s*Ga(x+) = s4{(2*b 1/b *o*r(1/b + 1 )}-^exp{-H b /(b*o b )}]; 

defining a distribution function 1 - s-Ga(x-) of the undesirable output data by utilizing both the 
region s and the generalized Gaussian distribution Ga(x) which is defined by the parameter a, the 
Gamma function r, the parameter b, and a difference x- between the output data and the undesirable 
75 output data, the distribution function 1 - s*Ga(x-) being defined by an equation 

1 - s-Ga(x-) = 1 - s-[{(2-b 1/b -a.r(1/b +1)y 1 -exp{-|x-| b /(b-c b )}]; 

providing the input data to the first neurons (22); 
20 providing the teaching signal data to the learning apparatus; 

calculating a value of the output data obtained by converting the input data while utilizing the 
control parameters; 

deriving an attraction term r + = In (a) + 1/b*ln b + In r(1/b + 1) + x + b /(b*a b ) and a repulsion term 

25 r- = s/[{(2*b 1/b .o.r<1/b + 1)}- 1 *exp{-|x-| b /(b*a b )}] 

from a logarithmic-likelihood l E (x) based on a statistical entropy as follows, 

-! E (x) = -In Ga(x + ) + s*Ga(x-) 
30 r+ = -In Ga(x+) 
r- = s*Ga(x-); 

calculating a value of the attraction term r* in cases where the desired output data is provided to 
the learning apparatus as the teaching signal data, a value of the attraction term r+ being decreased 
35 along with the decrease of the difference x+; 

calculating a value of the repulsion term r- in cases where the undesirable output data is provided 
to the learning apparatus as the teaching signal data, a value of the repulsion term r- being decreased 
along with the increase of the difference x-; 

calculating a loss function r found by adding the repulsion term r- and the attraction term r+ 
40 together; 

iteratively calculating the value of the loss function r by renewing the values of the control 
parameters and the parameter o to decrease the value of the loss function r until the value of the loss 
function r is less than a prescribed value in cases where the value of the loss function r is equal to or 
greater than the prescribed value; and 
45 adjusting the control parameters and the parameter a of the learning apparatus to newest values of 

the control parameters and the parameter o when the value of the loss function r is less than the 
prescribed value. 

20. A learning method for adjusting control parameters of a learning apparatus in which many items of 
so input data are converted into N items of output data by utilizing the control parameters and teaching 
signal data, comprising steps of: 

defining an output vector designated by N items of output data in N dimensions, the output vector 
indicating output coordinates of which components equal to values of the output data; 

defining boundary specifying data which specifies a boundary surface dividing a desired region 
55 from an undesired region in N-dimensional space, the boundary specifying data being a type of 
teaching signal data; 

defining a loss function which decreases when the output coordinates indicated by the output 
vector are shifted toward the desired region from the undesired region specified by the boundary 
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specifying data; 

providing both the input data and the boundary specifying data to the learning apparatus; 
calculating the output vector designated by the output data obtained by converting the input data 
by utilising values of the control parameters; 
5 calculating a value of the loss function by utilizing both the output coordinates indicated by the 

calculated output vector and the boundary surface specified by the boundary specifying data; 

iterativefy calculating the value of the loss function by renewing the values of the control 
parameters to decrease the value of the loss function until the value of the loss function is less than a 
prescribed value in cases where the value of the loss function is equal to or greater than the prescribed 
10 value; and 

adjusting the control parameters of the learning apparatus to newest values of the control 
parameters when the value of the loss function is less than the prescribed value. 

A learning method according to 20 in which the boundary specifying data specified in the step of 
defining boundary specifying data consists of a normal vector v for specifying a plane direction of the 
boundary surface and for directing towards the desired region from the undesired region, and a 
boundary vector y for specifying a passing point which the boundary surface passes. 

22 A learning method for adjusting both weight parameters and threshold values of a learning apparatus by 
utilizing teaching signal data and a plurality of neurons interconnected in an artificial neural network in 
which input data provided to first neurons (22) of a first stage (21) is weighted with the weight 
parameters and the threshold values are subtracted from the weighted input data so that the weighted 
input data is transmitted to final neurons (26) of a final stage (25) through hidden stages (24) in which 
the data is weighted with the weight parameters, subtracted the threshold values, and converted by 
applying a prescribed monotone increasing function, after which output data is provided from the final 
neurons (26). comprising steps of: 

defining an output vector designated by N items of output data in N dimensions, the output vector 
designating output coordinates of which components equal to values of the output data; 

defining boundary specifying data which specifies a boundary surface dividing a desired region 
from an undesired region in N-dimensional space, the boundary specifying data being a type of the 
teaching signal data; 

defining a loss function which decreases when the output coordinates designated by the output 
vector are shifted toward the desired region from the undesired region specified by the boundary 
specifying data; 1 

providing the input data to the first neurons (22); providing the boundary specifying data to the learning 
apparatus; y 

calculating the output vector designated by the output data obtained by respectively weighting the 
input data with the weight parameters and subtracting the threshold values from the weighted input 
data; 

calculating a value of the loss function by utilizing both the output coordinates designated by the 
calculated output vector and the boundary surface specified by the boundary specifying data; 

iteratively calculating the value of the loss function by renewing the weight parameters and the 
threshold values to decrease the value of the loss function until the value of the loss function is less 
than a prescribed value in cases where the value of the loss function is equal to or greater than the 
45 prescribed value; and 

adjusting the weight parameters and the threshold values of the learning apparatus to the newest 
weight parameters and the newest threshold values when the value of the loss function is less than the 
prescribed value. 

50 
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